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Abstract 

Proof-Carrying  Code  (PCC)  is  a  general  framework  for  the  mechanical  verification 
of  safety  properties  of  machine-language  programs.  It  allows  a  code  producer  to 
provide  an  executable  program  to  a  code  consumer,  along  with  a  machine-checkable 
proof  of  safety  such  that  the  code  consumer  can  check  the  proof  before  running  the 
program.  PCC  has  the  advantage  of  small  Trusted  Computing  Base  (TCB),  since 
the  proof  checking  can  be  a  simple  mechanical  procedure.  A  weakness  of  previous 
PCC  systems  is  that  the  proof- checking  infrastructure  is  based  on  some  complicated 
logic  or  type  system  that  is  not  necessarily  sound. 

Foundational  Proof-Carrying  Code  (FPCC)  aims  to  further  reduce  the  TCB 
size  by  an  order  of  magnitude  by  building  the  safety  proof  based  on  the  simple  and 
trustworthy  foundations  of  mathematical  logic.  There  are  three  major  components 
in  an  FPCC  system:  a  compiler,  a  proof  checker,  and  the  safety  proof  of  an  input 
machine-language  program.  The  compiler  produces  machine  code  accompanied  by 
a  proof  of  safety.  The  proof  checker  verifies,  sometimes  also  reconstructs,  the  safety 
proof  before  the  program  gets  executed. 

We  have  built  a  prototype  system.  Our  prototype  is  the  first  end-to-end  FPCC 
system,  including  a  type-preserving  compiler  from  Core  ML  to  SPARC  (based  on 
SML/NJ),  a  low-level  typed  assembly  language  LTAL,  a  foundational  proof-checker 
Flit,  and  a  nearly  complete  machine-checkable  soundness  proof.  The  system  com¬ 
piles  Core  ML  programs  to  SPARC  code,  accompanied  with  programs  in  a  low-level 
typed  assembly  language;  these  typed  assembly  programs  serve  as  the  proof  wit¬ 
nesses  of  the  safety  of  the  corresponding  SPARC  machine  code. 

In  this  thesis,  I’ll  explain  the  design  of  interfaces  between  these  components  and 
show  how  to  build  an  end-to-end  FPCC  system.  We  have  concluded  that  a  type 
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system  (a  low-level  typed  assembly  language)  should  be  designed  to  check  machine 
code,  and  that  the  proof-checking  should  be  factored  into  two  stages,  namely  type¬ 
checking  of  the  input  machine  code  and  verification  of  soundness  of  the  type  system. 
Since  a  type  checker  can  be  efficiently  interpreted  as  a  logic  program,  Flit  builds  in 
a  simple  logic  programming  engine  which  enables  efficient  proof-checking. 
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Chapter  1 


Introduction 


During  1960s,  with  the  rapid  advance  in  the  hardware  industry,  the  so-called  “soft¬ 
ware  crisis”  emerged.  The  software  industry  were  not  able  to  keep  pace  with  the 
rapid  advance  of  hardware.  Software  projects  were  notoriously  behind  schedule  and 
over  budget,  and  software  products  were  full  of  defects  and  unreliable.  While  there 
have  been  lots  of  improvement  with  programming  language  and  software  engineer¬ 
ing  technology  since  then,  software  is  still  extremely  fragile:  unreliable,  insecure, 
and  full  of  bugs.  Frederick  Brooks  explains  “why  programming  is  hard  to  man¬ 
age”  in  his  book  The  Mythical  Mail- Month  [Brooks,  1975],  and  many  principles  and 
observations  still  apply  today. 


1.1  Software  Security:  A  Growing  Problem 

On  the  other  hand,  the  extensive  use  of  computers  and  the  accelerating  trends  of 
interconnectedness,  complexity,  and  extensibility  pose  an  increasing  demand  on  the 
security  of  software.  While  interconnected  computers  on  the  Internet  make  our  life 
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easier,  malicious  code  such  as  viruses  and  worms  can  exploit  the  vulnerability  of  soft¬ 
ware  and  spread  over  the  world  in  a  minute.  The  complexity  of  software  systems  is 
rising.  Large  and  complex  systems  tend  to  have  more  bugs  and  are  more  vulnerable 
to  malicious  code.  Many  of  today’s  software  systems  support  extensibility  through 
a  number  of  ways  such  as  scripting,  macros,  and  applets.  The  infamous  Melissa 
and  Love  Bug  viruses  took  advantage  of  the  Internet  and  the  macro  and  scripting 
extensions  of  the  Microsoft  Word  document  processing  program  and  Outlook  e-mail 
client  [McGraw  and  Morrisett,  2000;  Martin,  2000;  Slade,  1999]. 

As  our  society  becomes  increasingly  dependent  on  information  technology,  we 
must  be  able  to  produce  software  systems  that  are  more  secure,  reliable,  and  depend¬ 
able.  In  this  thesis,  we  describe  a  promising  approach  to  addressing  the  program 
safety  problem  and  show  how  to  build  and  verify  secure  software  from  the  minimum 
trusted  computing  base.  Part  of  this  thesis  work  has  been  published  in  several  con¬ 
ferences.  Chapter  3  is  the  extended  version  of  a  PLDI  paper  [Chen  et  ah,  2003]. 
Chapter  4  is  based  on  the  techniques  described  in  Appel  and  McAllester  [2001],  Wu 
et  al.  [2003]  and  Tan  et  al.  [2004],  Chapter  5  is  the  extended  version  of  a  PPDP 
paper  [Wu  et  ah,  2003]. 

1.2  Classical  Security  Principles 

The  price  of  reliability  is  the  pursuit  of  utmost  simplicity. 

C.A.R.  Hoare 

To  design  secure  systems,  it  is  important  to  follow  well-known  design  principles. 
In  this  section,  we  review  two  classical  security  principles,  namely  the  principle  of 
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least  privilege  and  the  principle  of  minimum  Trusted  Computing  Base  (TCB).  The 
principle  of  minimum  TCB  is  one  of  the  important  criteria  used  to  measure  the 
trustworthiness  of  our  system. 

1.2.1  Principle  of  least  privilege 

The  principle  of  least  privilege  is  an  important  concept  in  computer  security.  It  was 
first  described  by  Saltzer  and  Schroeder  [1975]: 

Every  program  and  every  user  of  the  system  should  operate  using  the 
least  set  of  privileges  necessary  to  complete  the  job.  Primarily,  this 
principle  limits  the  damage  that  can  result  from  an  accident  or  error. 

It  also  reduces  the  number  of  potential  interactions  among  privileged 
programs  to  the  minimum  for  correct  operation,  so  that  unintentional, 
unwanted,  or  improper  uses  of  privilege  are  less  likely  to  occur. 

The  principle  of  least  privilege  states  that  a  user,  a  system,  or  a  program  should 
be  given  no  more  privilege  than  necessary  to  perform  a  task.  This  can  minimize  the 
damage  that  can  occur  should  your  code  be  exploited  by  a  malicious  user  since  the 
minimum  privilege  is  granted  for  the  code. 

The  principle  should  be  used  in  every  system  that  is  applicable.  A  good  real- 
world  example  of  this  principle  is  the  US  government  “need  to  know”  policy  in  the 
security  clearance  system.  People  are  only  allowed  to  access  documents  that  are 
relevant  to  their  tasks. 

Many  programs  run  under  UNIX  systems  (and  other  operating  systems  too) 
violate  the  principle  of  least  privilege.  The  Sendmail  program  is  a  classical  example. 
Sendmail  runs  with  root  permissions  since  it  requires  root  privileges  to  set  up  a 
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service  on  port  25 — the  SMTP  port.  After  the  set  up,  Sendmail  never  gives  up  its 
root  privileges.  Therefore,  if  a  malicious  attack  can  buffer  overflow  in  Sendmail,  the 
attack  can  trick  Sendmail  to  run  arbitrary  code  with  root  permissions. 

1.2.2  Principle  of  minimum  trusted  computing  base 

The  Trusted  Computing  Base  (TCB)  is  the  set  of  hardware  and  software  that  needs 
to  be  trusted  for  the  security  of  a  task.  Since  nowadays  hardware  is  quite  reliable, 
trusted  software  systems  tend  to  be  the  most  significant  component  of  TCB. 

It  is  important  to  keep  the  TCB  small  and  simple  because,  in  general,  large  and 
complex  systems  tend  to  have  more  defects.  In  an  investigation  of  Java-enabled 
browsers  conducted  by  Dean  et  al.  [1997],  they  found  that  there  is  one  security- 
relevant  bug  per  3,000  lines  of  source  code  in  average  in  the  first-generation  imple¬ 
mentations.  The  TCBs  of  various  Java  Virtual  Machines  are  at  between  50,000  and 
200,000  lines  of  code  [Appel  and  Wang,  2002],  The  SpecialJ  JVM  [Colby  et  ah, 
2000]  reduces  the  TCB  to  36,000  lines  by  using  proof-carrying  code.  In  this  work, 
we  will  show  how  to  reduce  the  size  of  the  TCB  to  under  3,000  lines  and  make  the 
proof  checker  small  and  simple  enough  to  be  manually  verifiable. 

1.3  Existing  Practices 

Traditional  language-based  techniques  have  been  focused  on  the  high-level  code 
safety.  Recent  researches  on  Typed  Assembly  Languages  (TAL)  [Morrisett  et  ah, 
1998,  1999a, b],  Proof- Carrying  Code  (PCC)  [Necula  and  Lee,  1996;  Necula,  1997], 
security  types  and  information  flow  security  [Sabclfeld  and  Myers,  2003],  software 
fault  isolation  [Walibe  et  ah,  1993],  virtual  machines  [Lindholm  and  Ycllin,  1996; 
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Platt,  2001],  typed  intermediate  languages  [Tarditi  et  al.,  1996;  Shao,  1997;  Shao  and 
Appel,  1995;  Chen  et  ah,  2003],  and  certifying  compilers  [Colby  et  al.,  2000;  League 
et  al.,  2003;  Chen  et  ah,  2003]  have  generated  exciting  results  on  low-level  code 
safety,  demonstrating  that  language-based,  security  is  a  promising  technique  for  many 
security  problems,  such  as  buffer  overflow  and  format  string  attacks,  information 
leaks,  etc.,  and  for  building  trustworthy  and  high-assurance  systems. 

In  the  following,  we  review  some  of  the  existing  techniques  for  ensuring  the 
reliability  and  safety  of  running  untrusted  code.  One  of  the  important  criteria  we 
used  to  compare  these  different  approaches  is  the  size  of  TCB. 

1.3.1  Authentication 

Users  may  install  and  run  untrusted  programs  based  on  authentication  from  some 
known  and  trusted  party.  A  typical  example  is  the  dynamic  software  patch  up¬ 
date  system  for  Microsoft  Windows,  for  example.  Users  download  patches  and 
install  them  after  checking  the  authentication  (electronically  signed  by  Microsoft). 
Strictly  speaking,  authentication  does  not  guarantee  any  property  of  the  authen¬ 
ticated  code.  It  only  guarantees  that  the  code  is  from  some  known  party  based 
on  cryptography.  Authentication  does  not  reduce  the  TCB  size  since  it  does  not 
guarantee  any  program  property. 

1.3.2  Virtual  memory  protection 

Modern  computer  systems  use  virtual  memory  to  protect  a  process  from  other 
processes  by  checking  the  memory  boundary.  Hardware  and  operating  system  are 
coordinated  to  make  sure  that  application  programs  do  not  bypass  the  virtual  mem- 
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ory  API,  which  is  usually  implemented  as  OS  system  calls.  Although  the  virtual 
memory  is  a  very  successful  technique  used  in  the  modern  operating  systems,  it 
is  clumsy  to  implement  and  not  flexible  enough  for  situations  other  than  memory 
safety. 

1.3.3  Software  fault  isolation 

Software  Fault  Isolation  (SFI)  [Wahbe  et  ah,  1993]  instruments  machine  code  with 
additional  runtime  checking  to  ensure  some  safety  property.  It  allows  cooperating 
software  modules  to  exist  in  the  same  address  space  and  make  sure  that  they  don’t 
trash  each  other  by  additional  runtime  checking  on  jump  and  store  to  ensure  safety. 
Applications  such  as  extensible  kernels  and  databases  can  benefit  from  SFI  because 
SFI  provides  an  efficient  way  to  run  external  programs  safely  via  application  isola¬ 
tion  in  the  same  address  space  without  context  switch  overhead.  One  disadvantage 
of  SFI  is  that  it  has  some  runtime  overhead,  and  is  not  very  flexible  for  ensuring 
properties  other  than  memory  safety. 

1.3.4  Java  bytecode  verification 

The  Java  Virtual  Machine  (JVM)  [Lindholm  and  Yellin,  1996]  provides  additional 
safety  check  at  the  bytecode  level  via  a  mechanism  called  Java  bytecode  verification. 
In  this  framework,  the  bytecode,  compiled  from  Java  source  code,  is  checked  for 
safety  before  execution.  Then  users  do  not  need  to  trust  the  Java  compiler,  which 
translates  Java  source  programs  into  bytecode,  since  the  safety  of  bytecode  is  ver¬ 
ified.  So  the  Java  compilers  are  not  in  the  TCB.  However,  in  practice,  bytecode 
is  not  interpreted  due  to  inefficiency.  Usually  bytecode  is  compiled  just  in  time  to 
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machine  code  before  execution.  The  Java  Just-In-Time  (JIT)  compilers  must  be 
trusted  for  correctness  and  safety.  Note  that  production-quality  JIT  compilers  are 
usually  large  and  complex. 

1.3.5  Typed  assembly  languages 

In  the  Typed  Assembly  Language  (TAL)  [Morrisett  et  ah,  1998,  1999a, b]  framework, 
a  source  program  is  compiled  into  a  typed  assembly  program,  which  can  be  type 
checked.  Since  the  assembly  code  is  type  checked,  users  do  not  need  to  trust  the 
whole  complicated  (and  maybe  buggy)  compiler  anymore.  Only  the  TAL  type 
checker,  assembler  and  linker  are  in  the  TCB,  which  is  much  smaller  than  traditional 
compilers.  A  weakness  of  the  most  existing  TAL  systems  is  that  their  soundness  is 
not  formally  verified.  In  our  foundational  proof-carrying  code  project,  we  address 
this  problem  by  designing  a  low-level  typed  assembly  language  with  fully  machine- 
checkable  soundness  proof. 

1.3.6  Proof-carrying  code 

Proof-Carrying  Code  (PCC)  [Necula  and  Lee,  1996;  Necula,  1997]  is  a  general  frame¬ 
work  for  the  mechanical  verification  of  safety  properties  of  machine-language  pro¬ 
grams.  It  allows  a  code  producer  to  provide  an  executable  program  to  a  host  (code 
consumer),  along  with  a  machine-checkable  proof  of  safety  such  that  the  code  con¬ 
sumer  can  check  the  proof  before  running  the  program.  PCC  has  the  advantage 
of  small  TCB,  since  the  proof  checking  can  be  a  simple  mechanical  procedure.  A 
weakness  of  previous  PCC  systems  is  that  the  proof-checking  infrastructure  is  too 
complex  to  prove  sound  using  conventional  techniques. 
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1.4  Foundational  Proof-Carrying  Code 

Foundational  Proof-Carrying  Code  (FPCC)  [Appel,  2001]  aims  to  further  reduce 
the  TCB  size  by  an  order  of  magnitude  and  to  build  the  soundness  proof  based  on 
the  foundation  of  mathematical  logic.  There  are  three  main  components  in  a  foun¬ 
dational  proof-carrying  code  system:  a  compiler,  a  proof  checker,  and  a  safety  proof 
of  the  machine- language  program  compiled  from  a  source  program.  The  compiler 
should  produce  machine  code  accompanied  by  a  proof  (hint)  of  safety.  The  proof 
checker  verifies,  sometimes  also  reconstructs,  the  safety  proof  before  the  program 
gets  executed. 

ft  is  crucial  to  design  appropriate  interfaces  between  these  components.  This 
thesis  is  on  how  to  design  interfaces  between  type-preserving  compilers,  foundational 
proof  checkers,  and  machine-checkable  proofs,  and  on  how  to  build  an  end-to-end 
FPCC  system.  We  have  come  to  the  conclusion  that:  (1)  a  typed  assembly  lan¬ 
guage  should  serve  as  an  interface  between  the  proof-generating  compiler  and  other 
components;  (2)  logic  programming  (in  some  restricted  way)  is  a  good  mechanism 
for  efficient  proof-checking;  and  (3)  an  unified  logical  framework  is  convenient  for 
representing  proofs,  as  well  as  specifying  the  safety  theorem  and  machine  seman¬ 
tics.  By  using  an  unified  logical  framework,  such  as  LF  [Harper  et  al.,  1993]  and  its 
implementation  Twelf  [Pfenning  and  Schiirmann,  1999,  2002],  we  can  manipulate 
proofs  and  specification  in  the  same  language. 

Our  prototype  system  is  the  first  end-to-end  FPCC  system,  including  a  type¬ 
preserving  compiler  from  core  ML  to  SPARC  [Chen,  2004]  (based  on  SML/NJ  [Ap¬ 
pel  and  MacQueen,  1987,  1991]),  a  low-level  typed  assembly  language  LTAL  [Chen 
et  ah,  2003],  a  foundational  proof-checker  Flit  [Appel  et  ah,  2002;  Wu  et  ah,  2003], 
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and  a  nearly  complete  machine-checkable  soundness  proof  [Tan  et  al.,  2004],  In  the 
following,  we  briefly  explain  some  of  the  design  choices  and  implementation  of  our 
system,  as  well  as  the  connections  and  interfaces  between  the  compiler,  the  proof 
checker,  and  the  soundness  proof. 

1.4.1  Typed  assembly  language  interface 

Typed  assembly  languages  provide  a  way  to  generate  machine-checkable  safety 
proofs  for  machine-language  programs.  But  most  existing  typed  assembly  languages 
[Morrisett  et  al.,  1999b, a]  either  don’t  have  soundness  proofs,  or  have  proofs  that 
are  hand-written  and  cannot  be  machine-checked,  which  is  worrisome  for  such  large 
calculi.  We  have  designed  and  implemented  a  low-level  typed  assembly  language 
(LTAL)  with  a  semantic  model  and  established  its  soundness  from  the  model.  LTAL 
serves  as  the  interface  between  the  proof-generating  (type-preserving)  compiler  and 
the  pro  of- checking  components.  It  is  the  language  for  the  compiler  to  express  proof 
hints  of  the  safety  of  input  user  programs.  Compared  to  existing  typed  assem¬ 
bly  languages,  LTAL  is  more  scalable  and  more  secure;  it  has  no  macro  instructions 
that  hinder  low-level  optimizations  such  as  instruction  scheduling;  its  type  construc¬ 
tors  are  expressive  enough  to  capture  dataflow  information,  support  the  compiler’s 
choice  of  data  representations  and  permit  typed  position-independent  code;  and 
its  type-checking  algorithm  is  completely  syntax-directed.  We  encode  the  LTAL 
type  checker  as  a  logic  program  that  does  not  need  to  backtrack  since  the  type 
checking  is  completely  syntax-directed;  this  has  important  implication  of  efficient 
proof-checking  that  we  will  present  later  in  Chapter  5.  The  details  of  LTAL  is 
presented  in  Chapter  3. 
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1.4.2  Logic  programming  interface 

The  interface  and  mechanism  for  pro  of- checking  are  as  important  as  proof-generating. 
In  a  naively  designed  FPCC  system,  the  soundness  proof  could  be  100  times  larger 
than  the  machine  program  proved.  Take  the  LTAL  type  system  for  example:  It 
has  about  1000  type  checking  rules,  among  which  about  50  are  type  checking  rules 
for  instructions.  And  in  average,  each  instruction  type  checking  rule  has  about  10 
premises,  with  about  20  implicit  variable  bindings.  Assume  that  we  need  10  applica¬ 
tion  nodes  and  variable  bindings  to  check  each  premise.  Encoded  in  the  Edinburgh 
Logical  Framework  (LF)  [Harper  et  ah,  1993],  there  are  120  application  nodes  per 
machine  instruction.  For  each  instruction,  there  are  also  about  20  application  nodes 
for  machine  instruction  decoding.  If  we  encode  each  application  node  of  LF  with 
3  words,  it  is  420  times  of  the  size  of  the  machine  code.  This  is  the  size  of  type 
derivation  tree,  not  including  the  size  of  the  proof  for  each  type  checking  rule.  The 
size  of  the  type  checking  rules  is  a  constant,  though  it  could  be  large;  this  size  can 
be  amortized  away  because  the  proof  is  checked  once  and  for  all. 

In  addition  to  the  huge  proof  witness  problem,  untrustworthy  proof  rules  are  an¬ 
other  problem  in  the  previous  PCC  systems.  In  both  Necula’s  PCC  and  Morrisett’s 
TAL  systems,  type  checking  rules  are  trusted  as  axioms;  the  type  systems  used  in 
their  systems  do  not  have  a  machine-checked  soundness  proof.  Any  misunderstand¬ 
ing  of  the  semantics  of  type  checking  or  proof  rules  could  lead  to  errors  in  the  type 
system.  League  et  al.  [2003]  found  an  unsound  proof  rule  in  the  SpecialJ  [Colby 
et  al.,  2000]  type  system.  In  the  process  of  refining  our  own  TAL  [Chen  et  al.,  2003], 
we  routinely  find  and  fix  bugs  that  can  lead  to  unsoundness. 


No  previous  design  has  addressed  both  of  these  problems  simultaneously.  We 
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show  the  theory,  design,  and  implementation  of  a  proof-checker  that  permits  small 
proof  witnesses  and  machine-checkable  proofs  of  the  soundness  of  the  system.  The 
general  approach  is  to  write  a  logic  program  that  has  a  machine-checked  semantic 
correctness  proof.  The  logic  program  encodes  the  type  checking  rules  of  the  type 
system  (typed  assembly  language)  for  checking  machine  code.  Fist,  the  correctness 
proof  of  the  logic  program  is  checked  by  a  proof-checker;  in  our  system,  it  is  the 
LF  proof-checker  component  of  Flit.  Then,  the  logic  program  is  interpreted  by  a 
logic  programming  engine  to  check  the  input  machine  code.  Our  checker  Flit  has  a 
simple  logic  programming  engine  that  can  efficiently  interpret  LTAL  type  checker 
as  a  logic  program.  This  technique  can  be  used  in  other  domains  (besides  “proof¬ 
carrying”  )  to  write  logic  programs  with  machine-checked  guarantees  of  correctness. 
The  details  of  the  efficient  and  foundational  proof  checking  techniques  is  presented 
in  Chapter  5. 

1.4.3  Logical  framework 

To  develop  machine-checkable  proofs,  one  must  first  choose  a  logic  and  a  logical 
framework  in  which  we  encode  and  manipulate  objects  of  the  logic  chosen.  Our  proof 
and  specification  of  machine  semantics  and  safety  properties  are  based  on  higher- 
order  logic.  It  is  convenient  to  use  the  same  representation  for  logics,  theorems,  and 
proofs.  We  choose  the  LF  logical  framework  [Harper  et  ah,  1993]  to  encode  and 
manipulate  higher-order  logic. 

LF  is  a  dependently-typed  A-calculus  with  type  families  and  /^-equality.  It 
has  three  levels  of  terms:  objects,  types,  and  kinds.  Types  classify  objects  and 
kinds  classify  type  families.  LF  provides  a  convenient  tool  for  defining  logics,  with 
the  support  of  higher-order  abstract  syntax.  The  framework  is  general  enough  to 
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represent  logics  of  interest;  we  use  it  to  encode  higher-order  logic.  For  development, 
we  use  Twelf  [Pfenning  and  Schiirmann,  1999,  2002],  an  implementation  of  LF. 
Twelf  has  many  useful  features,  such  as  type  reconstructing  and  mode  analysis, 
which  make  it  a  convenient  tool  for  us  to  develop  machine-checkable  proofs  in  LF. 

While  Twelf  is  very  useful  for  developing  proofs  in  LF,  it  is  not  minimal  in  terms 
of  system  size  and  features.  Many  advanced  features,  such  as  type  inference  and 
Ernacs  interface,  are  not  needed  for  the  proof  checking  at  the  user  site,  though  these 
features  are  very  useful  for  development.  Thus  if  we  use  Twelf  as  the  ultimate  proof 
checker,  it  will  violates  the  “pay  as  you  go”  principle  and  users  have  to  trusted 
components  that  are  not  actually  needed  for  the  proof  checking  task.  For  efficient 
and  trustworthy  proof  checking  in  LF,  we  have  developed  our  own  LF  proof  checker 
called  Flit,  which  is  presented  in  Chapter  5. 

1.5  Thesis  Outline 

The  remainder  of  the  thesis  is  organized  as  follows.  Chapter  2  gives  an  overview  of 
our  foundational  proof-carrying  code  system.  In  Chapter  3,  we  introduce  the  LTAL 
interface,  including  its  syntax,  semantics,  type  checking  rules,  and  measurements. 
In  Chapter  4,  LTAL  is  given  a  semantic  model,  and  thus  the  machine-checkable 
soundness  proof  of  the  LTAL  calculus  is  presented.  In  Chapter  5,  we  present  the 
proof-checking  mechanism;  we  use  a  simple  system  to  illustrate  the  interfaces  be¬ 
tween  different  components  including  the  proof  checker.  Finally,  we  summarize  and 
give  an  outlook  of  future  work  in  Chapter  6. 


Chapter  2 


Foundational  Proof-Carrying  Code 


Everything  should  be  made  as  simple  as  possible,  but  no  simpler. 

Albert  Einstein 

Necula’s  PCC  system  [Necula,  1997]  constructs  for  untrusted  code  a  verification 
condition  (VC),  which  has  the  property  that  if  VC  holds  with  regard  to  the  logic 
axioms  and  the  typing  rules,  the  program  is  safe.  A  VC  generator  (VCGen)  is  used 
by  both  the  code  producer  and  the  code  consumer  to  construct  VCs.  VCGen  exam¬ 
ines  a  machine-code  program  instruction  by  instruction  and  calculates  the  weakest 
preconditions  for  each  instruction  in  Hoare-logic  style.  This  VC-based  verification 
builds  the  type  system  and  machine  instruction  semantics  into  the  algorithm  for 
formulating  the  safety  predicate.  VCGen  must  be  trusted  to  generate  the  right 
formula,  but  it  is  a  large  program  (23,000  lines  of  C  code  [Appel  and  Wang,  2002]), 
thus  difficult  to  guarantee  bug-free. 
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2.1  FPCC 

The  motivation  of  Foundational  PCC  is  to  make  the  TCB  as  small  as  possible, 
without  committing  to  any  specific  type  system.  We  believe  that  the  smaller  the 
TCB,  the  more  confidence  PCC  users  can  have.  Our  TCB  consists  of  the  specifi¬ 
cation  of  the  safety  policy,  machine  instruction  semantics,  and  the  proof  checker. 
In  the  current  implementation,  it  is  about  3,000  lines  of  code  [Appel  et  ah,  2002; 
Wu  et  ah,  2003],  of  which  about  half  is  the  specification  of  the  SPARC  instruction 
set  architecture.  To  make  the  TCB  minimal,  we  choose  Church’s  higher-order  logic 
with  a  few  axioms  of  arithmetic,  give  types  a  semantic  model  to  move  the  type 
system  out  of  the  TCB,  and  model  machine  instructions  by  a  step  relation  between 
machine  states;  we  avoid  VCGen  entirely  [Appel  and  Felty,  2000]. 

In  order  to  support  contravariant  recursive  datatypes  and  mutable  fields,  we 
model  types  as  predicates  on  states,  approximation  indices  [Appel  and  McAllester, 
2001],  and  type  levels  [Ahmed  et  ah,  2002],  We  have  an  abstraction  layer,  Typed 
Machine  Language  (TML)  [Swadi  and  Appel,  2001;  Swadi,  2003],  to  hide  the  com¬ 
plex  semantic  models  for  types.  TML  provides  a  rich  set  of  constructors  for  types, 
type  maps,  and  instructions,  and  an  orthogonal  set  of  primitive  type  construc¬ 
tors  such  as  union,  intersection,  existential  and  universal  quantification,  and  so  on. 
TML  is  so  expressive  that  its  type-checking  is  undecidable;  it  is  more  a  logic  than 
a  type  system.  However,  it  is  very  useful  for  building  semantic  models  of  higher- 
level,  application-specific  type  systems  such  as  LTAL:  We  give  LTAL  constructors 
a  semantic  model  in  terms  of  TML. 

The  FPCC  framework  is  shown  in  Figure  2.1.  A  source  program  is  compiled  into 
a  machine-code  program  and  an  LTAL  program.  The  code  consumer  receives  the 
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ML  program 


LTAL  Machine  ^ 

Program  Code 


Figure  2.1:  Foundational  PCC  framework.  Trusted  components  are  shaded. 
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LTAL  rules,  along  with  their  soundness  proof;  checks  the  soundness  proof  [Appel 
et  al.,  2002;  Wu  et  al.,  2003];  and  then  runs  the  hTAL  checker,  which  is  a  simple 
computation  (like  Prolog  but  without  backtracking). 


2.2  The  LTAL  Interface 

The  idea  of  Proof-Carrying  Code  [Necula,  1997]  is  that  the  compiler  should  produce 
machine  code  accompanied  by  a  proof  of  safety.  A  weakness  of  previous  PCC 
systems  is  that  the  proof-checking  infrastructure  is  too  complex  to  prove  sound  by 
conventional  techniques.  We  have  built  the  first  compiler  that  produces  machine 
code  accompanied  by  safety  proofs  that  are  machine-checkable  in  a  simple  logic  from 
minimal  axioms. 

Most  PCC  compilers,  including  ours,  are  based  on  typed  intermediate  languages 
or  typed  assembly  language  [Morrisett  et  ah,  1998,  1999b],  which  provide  a  way  to 
generate  safety  proofs  automatically.  TAh  has  a  soundness  guarantee:  If  a  TAL 
program  type-checks  and  there  is  no  bug  in  the  assembler,  the  machine  code  is  safe 
to  execute.  Soundness  is  proved  as  a  metatheorem  outside  of  the  proving  system; 
the  proof  is  hand-written  and  not  machine-checkable.  The  typing  rules  and  the  type 
checker  are  in  the  trusted  computing  base,  that  is,  bugs  in  these  components  can  let 
unsafe  code  slip  past  the  checker.  There  have  been  many  variants  of  TAh  [Morrisett 
et  ah,  1999a;  Xi  and  Harper,  2001;  Morrisett  et  ah,  2002],  which  rely  on  similar 
soundness  metatheorems.  A  recent  variant  TAhT  [Crary,  2003]  has  a  machine- 
checkable  metatheorem,  which  moves  the  typing  rules  and  the  type  checker  out  of 
TCB.  The  metatheorem  proof  checker,  such  as  Twclf  [Pfenning  and  Schurmann, 
1999,  2002]  used  by  TAhT,  is  usually  a  quite  big  program  and  has  to  be  trusted. 
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It  is  hard  to  manage  the  soundness  proofs  and  avoid  errors  when  scaling  up 
to  realistic  type  systems  for  real  compilers.  The  goal  of  our  Foundational  Proof- 
Carrying  Code  (FPCC)  [Appel,  2001]  project  is  to  build  machine-checkable  safety 
proofs  for  machine-code  programs  from  the  minimal  set  of  axioms.  We  have  de¬ 
signed  a  low-level  typed  assembly  language  (LTAL)  to  be  the  interface  between  the 
compiler  and  the  checker:  The  compiler  compiles  a  source  program  to  machine  code 
annotated  by  an  LTAL  program.  LTAL  annotations  are  not  in  the  machine  code, 
so  they  don’t  increase  machine  code  size  or  execution  time. 

The  soundness  of  LTAL  typing  rules  is  proved  not  by  a  metatheorem  as  in  TAL, 
but  by  their  semantic  model  [Tan  et  ah,  2004],  bottom  up:  First  we  use  higher- 
order  logic  with  axioms  for  arithmetic  to  prove  lemmas  about  machine  instructions 
and  types,  then  we  prove  the  TML  typing  rules  based  on  these  lemmas,  then  we 
prove  the  soundness  of  LTAL  typing  rules  in  the  TML  model.  Each  typing  rule  is 
represented  as  a  derived  lemma  in  our  logic. 

LTAL  benefits  from  its  semantic  model  in  many  aspects:  First,  it  is  more  scal¬ 
able.  Adding  new  rules  that  can  be  described  in  our  semantic  model  generally  does 
not  affect  the  soundness  of  existing  rules,  which  we  found  very  useful  in  evolving 
the  design.  Second,  it  is  more  secure  because  the  typing  rules  are  moved  out  of 
the  TCB.  Third,  TML  connects  LTAL  to  real  machine  instruction  semantics,  thus 
bridges  the  gap  between  typed  assembly  language  and  machine  language. 

LTAL  is  not  intended  as  a  universal  TAL.  Instead,  it  is  extensible.  Our  semantic 
modeling  technique  is  very  modular.  New  operators  can  be  added  to  LTAL  (and 
proved  sound)  without  disturbing  the  soundness  proofs  for  existing  operators,  as 
long  as  the  new  operators  conform  to  the  assumptions  in  the  semantic  model.  We 
started  with  a  very  simple  model  [Appel  and  Felty,  2000],  and  when  we  added 
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contravariant  recursive  types  [Appel  and  McAllester,  2001]  and  mutable  record  fields 
[Ahmed  et  ah,  2002]  these  changes  did  violate  previous  assumptions  and  require 
nonmodular  rewrites.  But  now  our  model  is  very  powerful  and  general:  None  of 
the  existing  LTAL  soundness  proofs  will  need  to  be  touched  when  we  add  operators 
to  handle  extensible  sums,  various  kinds  of  exception  handling  mechanisms,  various 
kinds  of  multidimensional  arrays  (with  or  without  pointer  indirections),  or  arbitrary 
predicates  on  scalar  values. 


2.3  FPCC/ML  Compiler 

The  FPCC/ML  compiler,  built  by  Chen  and  Fang  [Chen  et  ah,  2003],  transforms 
core  ML  (ML  without  the  module  system)  into  SPARC  code  with  LTAL  annotations. 
At  present  our  prototype  omits  exceptions  and  strings.  The  compiler  is  based  on 
the  Standard  ML  of  New  Jersey  (SML/NJ)  system  [Appel  and  MacQueen,  1987, 
1991], 

There  are  several  stages:  the  front  end  of  SML/NJ  translates  source  ML  pro¬ 
grams  to  FLINT  (a  typed  intermediate  language  based  on  F^)  [Shao,  1997];  we 
have  reused  the  FLINT  front  end.  The  newly  built  typed  CPS-conversion  and  clo¬ 
sure  conversion  phases,  built  by  Hai  Fang,  generate  NFLINT  (a  typed  intermediate 
language  like  Morrisett’s  A c  [Morrisett  et  ah,  1998,  1999b]).  The  next  few  phases, 
built  by  Juan  Chen,  break  down  complex  instructions,  build  basic  blocks,  and  insert 
coercions  to  get  machine- independent  LTAL  programs.  The  back  end,  also  by  Juan 
Chen,  takes  machine- independent  LTAL,  and  produces  machine  code  with  machine- 
specific  LTAL  annotations  and  some  auxiliary  information,  such  as  mapping  from 


labels  to  their  addresses. 
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SML/NJ’s  back  end  uses  the  untyped  MLRISC  retargetable  instruction  selec¬ 
tion,  register  allocation,  and  low-level  optimization  software  [George,  1997].  The 
difficulty  is  to  make  MLRISC  preserve  and  manipulate  type  information,  without 
rewriting  the  MLRISC  or  making  it  dependent  on  our  particular  type  system.  For¬ 
tunately,  MLRISC  already  had  some  support  for  an  annotation  mechanism  [Leung 
and  George]  that  permits  “comments”  on  the  instructions;  we  have  generalized  this 
mechanism  and  used  it  to  propagate  types. 

2.4  Checker 

Our  checker  has  two  main  components.  First,  it  uses  a  simple  LF  type-checker 
to  check  a  proof,  in  higher-order  logic,  of  the  soundness  of  the  LTAL  typing  rules 
[Appel  et  al.,  2002;  Wu  et  ah,  2003].  We  can  view  these  LTAL  rules  as  a  set  of 
lemmas. 

On  the  other  hand,  the  LTAL  rules  can  be  regarded  as  a  set  of  Prolog- like  clauses. 
Then,  because  these  rules  are  syntax-directed,  the  checker  can  run  a  very  simple 
subset  Prolog  interpreter  (without  backtracking)  on  these  rules  to  type-check  the 
machine- language  program  [Wu  et  ah,  2003]. 

The  LTAL  program  is  only  an  untrusted  hint  so  that  the  checker  can  take  ad¬ 
vantage  of  type  and  dataflow  information  from  the  compiler  in  proving  the  safety  of 
the  machine  code.  The  process  of  running  the  checker  on  a  machine  code  and  the 
corresponding  LTAL  program  is  like  type-checking  the  machine  code  according  to 
the  structural  information  from  the  LTAL  program.  The  overall  goal  of  the  checker 
is  judge_prog(I7,  P)  where  P  is  the  binary  code  (a  sequence  of  instruction  words) 
and  H  is  the  corresponding  LTAL  program.  The  predicate  judge_prog  characterizes 
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well-typedness.  The  checker  solves  this  goal  according  to  the  structure  of  H.  In  the 
underlying  semantic  model,  we  can  prove  that  well-typedness  implies  safety: 

judge_prog(iL,  P)  — >  safe(P). 

The  predicate  safe  is  the  machine-level  safety  policy.  When  the  checker  succeeds 
on  the  goal  judge_prog(if,  P ),  we  apply  this  lemma  to  get  a  proof  of  safe(P). 

2.5  Safety  Proof 

The  safety  proof  for  input  machine  code  consists  of  two  parts:  the  static  proof  of 
the  LTAL  type  checking  rules  and  the  syntactic  type  derivation.  We  have  built  an 
semantic  model  for  the  LTAL  type  system  based  on  mathematical  logic  and  the 
machine  semantics,  and  proved  each  LTAL  type  checking  rule  as  a  lemma  in  our 
system.  This  static  proof  of  the  soundness  of  LTAL  is  checked  once  and  for  all.  The 
syntactic  LTAL  type  checking  rules  are  interpreted  by  the  checker  to  verify  that  a 
type  derivation  exists,  that  is,  the  input  program  is  typeable  in  LTAL.  The  checker 
does  not  actually  build  the  huge  proof,  i.e.  the  type  derivation;  it  just  makes  sure 
that  there  exists  one. 

We  encode  the  LTAL  type  checking  rules  in  LF,  and  prove  them  as  lemmas. 
After  checking  the  validity  of  the  proofs  of  lemmas,  we  strip  off  the  proofs  and 
interpret  the  set  of  rules  as  a  logic  program  in  the  Flit  checker.  If  the  logic  program 
terminates  with  a  positive  result,  a  type  derivation  exists.  That  is,  there  exists  a 
machine-checkable  safety  proof  of  the  input  program,  although  we  actually  didn’t 
fully  build  it. 


Chapter  3 


Low-Level  Typed  Assembly 
Language 


We  have  designed  our  own  typed  assembly  language  LTAL  because  we  want  to 
generate  safety  proofs  of  machine  code,  with  as  much  flexibility  as  possible  for  an 
optimizing  compiler.  Thus,  even  part-way  through  a  sequence  of  instructions  that 
allocates  on  the  heap  or  that  does  datatype-tag  discrimination,  the  LTAL  type  sys¬ 
tem  must  be  able  to  describe  the  machine  state.  That  is,  LTAL  has  no  “macro” 
instructions:  Each  LTAL  instruction  corresponds  to  one  SPARC  instruction  or  is 
a  coercion  with  no  runtime  effect.  Because  no  sequence  of  instructions  is  unbreak¬ 
able,  low-level  optimizations  such  as  instruction  scheduling  are  permissible  (how¬ 
ever,  at  present  our  LTAL  does  not  accommodate  the  filling  of  branch-delay  slots  on 
SPARC).  Macro  instructions  in  other  TALs  (such  as  malloc  and  test- mid-branch) 
that  expand  to  a  fixed  sequence  of  machine  instructions,  interfere  with  low-level 
optimization. 
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TAL  Systems  1 

Special J  [Colby  et  al.,  2000]  • 

TALx86  [Morrisett  et  al.,  1999b]  0 

DTAL  [Xi  and  Harper,  2001] 

FTAL  [Hamid  et  al.,  2002] 

TALT  [Crary,  2003] 

Open  Verifier  [Chang  et  al.,  2005]  0 
Our  LTAL  [Chen  et  ah,  2003]  © 


2 

3 

4 

5 

6 

7 

8 

9 

10  11 

12 
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• 

O 
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© 

TAL  Features: 

1  Compiles  “real”  source  language 

2  Compiles  to  real  target  machine 

3  Foundational  specification 

4  Machine-checked  soundness  proof 

5  Minimal  checker 

6  Atomicity 

7  Compiler  can  choose  data  representations 

8  Dataflow  analysis 

9  Position-independent  code 

10  Basic  blocks 

11  Syntax-directed  checking 

12  Flexibility 


Keys: 

O  partially 
©  nearly 
•  completely 


Figure  3.1:  Comparison  of  TAL  and  PCC  systems.  (The  table  is  based  on  Chen 
et  al.  [2003].  The  status  of  TALT,  the  last  column,  and  the  entry  for  the  Open 
Verifier  are  new.) 


3.1  LTAL  Features 


Our  design  and  implementation  has  the  following  desirable  properties,  some  of  which 
are  shared  by  some  other  TAL  and  PCC  systems  (see  Figure  3.1): 


•  Compiles  a  “real”  source  language.  We  have  built  a  compiler  for  almost 
all  of  core  ML — a  full-scale  source  language  with  polymorphic  higher-order 
functions,  disjoint-sum  recursive  datatypes,  and  so  on. 

•  Compiles  to  a  real  target  machine.  We  generate  high-quality  SPARC 
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code.  Our  type-preserving  compiler  is  based  on  the  SML/NJ  system  [Appel 
and  MacQueen,  1987,  1991]. 

•  Foundational  specification.  We  have  a  concise  logical  specification,  inde¬ 
pendent  of  any  type  system,  of  the  safety  property  guaranteed  by  our  system: 
In  our  prototype  we  guarantee  memory  safety  and  that  only  a  certain  subset 
of  SPARC  instructions  will  be  executed  [Appel,  2001].  Furthermore,  our  spec¬ 
ification  relates  to  the  actual  machine  language  to  be  executed — not  assembly 
language — we  model  and  check  instruction  encodings  explicitly. 

•  Machine-checked  proof.  We  have  a  machine-checked  proof  (mostly  fin¬ 
ished)  of  the  soundness  of  our  system — that  is,  if  the  LTAL  type-checks,  the 
machine  code  is  safe.  Unlike  any  other  TAL  or  PCC  system,  our  proof  is 
with  respect  to  a  minimal  set  of  axioms,  the  largest  part  of  which  is  a  logical 
specification  of  the  instruction  set  architecture  of  the  SPARC  processor. 

•  Minimal  checker.  Just  in  case  you  are  worried  about  bugs  (or  Trojan  horses) 
in  proof  checkers,  our  soundness  proof  is  checkable  in  a  very  minimal  logic: 
The  trusted  base  of  our  system  (including  axioms,  machine  specification,  and 
a  C  program  implementing  the  LF  checking  and  a  simple  logic  programming 
engine)  is  about  3034  lines  of  code  [Appel  et  ah,  2002;  Wu  et  ah,  2003],  an 
order  of  magnitude  smaller  than  other  systems. 

•  Atomicity.  Some  other  TALs  have  “macro”  instruction  sequences  (or  even 
worse,  calls  to  the  runtime  system)  for  compare-and-branch,  or  datatype  tag¬ 
checking,  or  memory  allocation.  This  inhibits  optimizations  such  as  hoisting 

and  scheduling.1  Each  of  our  LTAL  instructions  corresponds  to  at  most  one 
lrThese  optimizations  can  be  done  in  the  assembler,  but  need  to  be  trusted  bug-free,  whereas 
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machine  instruction.  Some  LTAL  instructions  are  only  for  type  coercion,  and 
do  not  correspond  to  any  real  machine  instructions.  Because  no  sequence  of  in¬ 
structions  is  unbreakable,  low-level  optimizations  such  as  instruction  schedul¬ 
ing  are  permissible. 

•  Compiler  can  choose  data  representations.  For  data  structures  such  as 
tagged  disjoint  sums,  a  compiler  may  want  to  exercise  discretion  in  choos¬ 
ing  data  layouts,  unhampered  by  assumptions  built  into  a  typed  assembly 
language.  LTAL  permits  this  flexibility;  some  other  TALs  do  not. 

•  Dataflow  &  induction  analysis.  LTAL  includes  existential  and  singleton 
types  that  are  powerful  enough  to  permit  dataflow-based  safety  proofs  of  op¬ 
timized  machine  code  (though  our  prototype  compiler  does  not  exploit  all  of 
this  power  yet). 

•  Position-independent  code.  To  avoid  the  need  to  trust  a  linker,  we  show 
how  to  check  typed  position-independent  code — even  in  the  presence  of  long 
jumps  and  of  operations  that  move  code  addresses  into  pointer  variables  and 
closures. 

•  Basic  blocks.  LTAL  groups  instructions  into  basic  blocks,  making  it  easy 
for  an  optimizing  compiler  to  reorder  blocks  to  optimize  cache  placement  or 
shorten  span-dependent  instructions. 

•  Syntax-directed.  Typechecking  LTAL  is  completely  syntax-directed.  As  we 
will  describe  later,  the  LTAL  type  checker  can  be  encoded  as  a  simple  logic 


our  system  does  not  need  to  trust  them. 
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K 

:=  Numeric  Scalar  12 

T 

:=  (See  Figure  3.4) 

CC 

:=  cc_cmp(ri,  72)  cc_testbox( 

cc_testmem(m)  cc_none 

V 

:=  x  |  i  |  l  |  c(v)  |  vdiff(h,l2) 

c 

:=  (See  Figure  3.7) 

op  : 

:=  +  1  -  1  *  1  / 

7T  : 

dk 

V 

IV 

A 

i 

:=  (See  Figure  3.16) 

B  : 

:=  l[a  :  R\(m,  cc,  Vi  :  Ty,..., . ,  vr 

LRT 

■=  ( L,R,T ) 

L 

:=  {l\  1— >•  Oi, . . . ,  ln  1— >  an} 

R 

:=  {xi  ri, . .  .,xn  rn} 

T 

P  : 

:=  (LRT,  B) 

Kinds 
Types 
Condition  Codes 

Values 
Coercions 
Arithmetic  Operators 
Arithmetic  Compares 
Instructions 
Basic  blocks 
Environments 
label  map 
register  map 
type  abbreviation  map 
Program 


Figure  3.2:  LTAL  syntax:  Overview. 


program  that  does  not  need  to  backtrack,  which  has  important  implication 
for  efficient  proof  checking. 


•  Flexibility.  Our  framework  is  very  flexible.  Many  of  the  LTAL  features  are 
orthogonal,  which  makes  LTAL  easily  extensible.  We  believe  LTAL  can  be 
extended  to  compile  other  source  languages  on  different  architectures  without 
much  difficulty. 


3.2  Syntax  Overview 


LTAL  is  a  calculus  with  conventional  features  such  as  variable  names  and  scoping 
rules.  This  is  unlike  other  TALs,  which  use  registers  and  memory  locations  directly 
instead  of  variables.  The  LTAL  syntax  is  shown  in  Figure  3.2,  3.4,  3.7,  and  3.16. 
LTAL  supports  first-order  kinds;  it  has  only  limited  support  for  higher-order 
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kinds,  since  TML  does  not  model  higher-order  kinds  in  full  generality.  For  core 
ML,  this  is  enough.  The  kind  Numeric  classifies  singleton  numeric  types.  The  kind 
Scalar  classifies  types  that  are  scalar  under  the  TML  semantic  model.  Most  types 
presented  in  Section  3.4  are  of  kind  Scalar.  However,  in  our  semantic  model,  it  is 
convenient  to  describe  the  register  bank  or  typing  environment  0  as  a  type.  The 
register  bank  type  and  environment  types  are  not  scalar;  the  scalar  types  only  care 
about  the  first  element  of  the  vector  in  the  semantic  model.  The  kind  Numeric  is  a 
sub-kind  of  Scalar.  All  other  types  are  of  kind  fL 

LTAL  has  a  set  of  standard  types:  type  variables,2  top  and  bottom  types,  in¬ 
teger  types,  existential  types,  and  recursive  types  (See  Figure  3.4  for  the  detailed 
description  of  the  LTAL  types).  There  are  low-level  constructors  to  model  high-level 
abstractions,  such  as  singleton  integer  type  n  and  refined  integer  type  int7r(rz)  for 
integers  (i  has  type  int„-(n)  means  inn  is  true,  where  n  is  a  predicate  on  integers 
such  as  =  or  <),  field  types,  intersection  types  and  union  types  for  records  and 
user-defined  datatypes. 

To  model  basic  blocks  (with  their  live  variables)  and  functions  (with  their  formal 
parameters)  we  have  polymorphic  “code  pointer”  types  codeptr[a  :  H](m,  cc,  v\  : 
Ti, ...  ,vn  :  Tn),  where  a  :  n  is  a  list  of  type  variables  with  kinds,  m  is  the  available 
memory  size  known  at  this  point,  cc  is  the  condition  code  requirement,  and  :  t* 
are  the  input  arguments  and  types. 

For  label  arithmetic  and  position-independent  code  type  checking,  we  have  type 
constructors  addr  and  diff,  which  will  be  further  explained  in  Section  3.9  and  3.10. 

Type  def  refers  to  a  type  expression  by  a  name  in  our  implementation,  names 

2In  our  implementation  we  use  de  Bruijn  indices,  but  for  presentation  purpose,  we  sometimes 
show  named  variables. 
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are  just  integers.  Each  program  can  have  a  sequence  of  type  abbreviations  that  give 
names  to  type  expressions.  This  mechanism  makes  LTAL  programs  concise,  and 
saves  the  checker  some  work.  The  checker  expands  a  name  to  the  type  expression 
it  stands  for  only  when  such  expansion  is  needed.  Otherwise,  the  checker  simply 
passes  the  name  around,  which  is  more  efficient  than  passing  the  type  expression. 
The  body  of  a  type  definition  could  be  an  open  type  expression  with  free  variables. 
Type  variables  cannot  be  used  for  this  purpose  since  they  usually  stand  for  closed 
type  expressions. 

We  have  a  special  category  cc  to  capture  the  condition  code  status  (on  machines 
with  condition  codes),  which  includes  cc_cmp  for  comparison,  cc_testbox  for  testing 
whether  or  not  a  type  is  boxed,  cc_testmem  for  memory  availability  testing,  and 
cc_none  for  arbitrary  status. 

A  value  can  be  a  variable  x,  an  immediate  integer  i,  a  label  /,  a  coerced  value 
c(v)  (where  c  is  a  coercion),  or  a  vdiff  value.  Values  and  their  typing  are  further 
explained  in  Section  3.5. 

Coercions  are  used  to  change  the  type  of  values;  all  coercions  are  free  of  runtime 
effect,  as  they  follow  subtyping  relations  in  the  underlying  model.  Many  of  these 
coercions  are  conventional,  such  as  identity,  composition,  pack,  fold/unfold,  inject, 
and  project.  Coercion  rules  are  further  discussed  in  Section  3.6. 

LTAL  has  a  machine- independent  core,  which  includes:  move  and  ALLI  instruc¬ 
tions,  sethi  for  loading  large  integers,  store  and  load  instructions,  addradd  for  ad¬ 
dress  arithmetic,  select  for  loading  a  record  field,  gettag  for  loading  the  tag  field  of 
a  sum  type  value,  init,  record,  and  inc-allocptr  for  heap  allocation,  call  for  jumping 
to  some  label,  and  calln  for  “call  by  fall-through,”  (which  generates  no  code).  Each 
target  machine  requires  the  addition  of  machine-specific  operators  and  rules.  The 
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instructions  in  LTALsparc  that  are  specific  to  machines  with  condition  codes  are: 
cmpcc  compares  two  integers  and  sets  condition  codes;  cmpcci  compares  a  value 
with  a  compile-time-known  integer,  sets  condition  codes  and  refines  the  type  of  the 
value;  testbox  tests  if  a  value  is  boxed;  testmem  tests  for  out-of-heap;  if  is  normal 
conditional  branch  without  type  refinement;  iff  is  conditional  branch  with  type  re¬ 
finement  in  both  branches,  ifboxed  refines  types  for  boxedness  of  the  value  (of  a 
sum  type),  and  iffull  and  iftag specialize  type  refinement  for  memory  allocation  and 
datatype  tag  discrimination,  respectively.  The  LTAL  instructions  and  their  typing 
rules  will  be  further  discussed  in  Section  3.9. 

Function  declaration  l[a  :  if](m,cc,v i  :  Ti,...,vn  :  Tn)  =  4;...;^  defines  a 
function  (basic  block)  with  label  /,  type  parameters  a  :  if,  formal  parameters  V\  : 
Ti, ,  vn  :  Tn,  and  function  body  Li ...  tk  which  is  a  sequence  of  LTAL  instructions. 
The  number  m  specifies  how  much  memory  is  guaranteed  to  be  available  when  the 
function  is  called.  It  is  a  compile-time  known  constant.  If  a  function  specifies  16 
words  and  allocates  no  more  than  16  words,  for  example,  there  is  no  need  to  test 
the  memory  availability.  Otherwise,  it  has  to  check  explicitly  if  there  is  enough 
memory.  The  condition-code  requirement  cc  specifies  the  status  of  condition  codes 
when  the  function  is  called.  The  function  label  l  is  assigned  a  code  pointer  type 
codeptr[a  :  if](m,  cc,v\  :  T\, ,  vn  :  rn).  Each  function  is  closed  in  the  sense  that 
there  are  no  free  type  variables  or  value  variables. 

The  triple  LRT  represents  three  environments  that  keep  auxiliary  information 
for  type  checking.  The  label  environment  L  is  a  map  from  program  labels  to  their 
addresses  (offset  from  the  beginning  of  the  program).  The  register-allocation  envi¬ 
ronment  R  maps  variables  to  temporaries  (registers  or  spill  locations).3  The  type 

3To  model  the  fact  that  a  value  in  a  register  can  belong  to  two  different  types  at  once  (if  their 
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abbreviation  environment  T  maps  type  abbreviations  to  their  expansions.  Type 
abbreviations  are  used  to  gain  concise  type  expressions  and  the  type  checker  opens 
a  type  abbreviation  only  when  needed. 

An  LTAL  program  consists  of  the  above  environments  and  a  list  of  basic  blocks, 
which  can  be  viewed  as  a  set  of  function  declarations. 


3.3  Static  Semantics  Overview 

The  low-level  type  and  term  constructors  in  LTAL  make  the  typing  system  expres¬ 
sive.  Yet  we  need  a  decidable  and  simple  type-checking  algorithm  so  that  proof 
generation  can  be  done  without  a  complicated  decision  procedure  or  constraint 
solver.  To  this  end,  we  have  made  LTAL  completely  syntax-directed.  There  are  no 
subtyping  rules;  instead,  we  use  coercions  to  avoid  nondetcrministic  choices  during 
type  checking.  We  explain  various  typing  judgements,  and  then  show  some  typing 
rules  in  this  section. 

The  typing  judgement  for  values  Lf?T;p;0  b  v  :  r  means  value  v  has  type  r 
under  environment  LRT ;  p;  0.  The  triple  LRT  is  part  of  the  program.  The  kind 
environment  p  is  a  list  of  kinds  for  type  variables  bound  so  far.  In  our  implemen¬ 
tation  we  use  de  Bruijn  numbers  to  represent  type  variables;  the  ith  (starting  from 
0)  element  of  the  kind  list  p  is  the  kind  for  the  type  variable  of  de  Bruijn  index  i. 
The  value  environment  0  maps  variables  to  their  types. 

The  judgement  LRT  b  (p;  0;  cc )  {t}  (p';  M>,\  0';  cc!)  means  after  instruction 

i  is  executed,  environment  (p;  ^b;  0;  cc)  becomes  (p';  0';  cc!).  The  construction 

intersection  is  nonempty),  we  choose  not  to  use  intersection  types.  Instead,  we  say  that  each 
variable  can  have  only  one  type  (globally),  and  in  the  register-allocation  environment  more  than 
one  variable  at  a  time  can  mapped  to  the  same  register. 
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(f),v  :  t  augments  q b.  The  construction  (<f)\v),v  :  r  kills  dead  bindings  before  adding 
the  new  binding  v  :  r;  it  keeps  the  alive  bindings  unchanged.  To  be  more  specific, 
for  a  binding  u  :  tu  in  </>,  if  variables  u  and  v  are  assigned  to  the  same  register  and  u 
is  not  alive  after  the  execution  of  instruction  l,  the  binding  u  :  ru  is  killed;  otherwise, 
it  remains  the  same.  When  there  is  no  ambiguity,  we  use  (j),  v  :  r  for  both  purposes. 
The  heap-allocation  environment  is  explained  in  Section  3.8.  The  environment 
cc  specifies  the  current  status  of  condition  codes. 

As  an  example  we  will  show  a  simplified  rule  for  an  LTAL  add  instruction.  In 
Section  3.10  we  will  show  a  different  typed  version  of  add.  These  two  different  typed 
versions  of  add  expand  to  the  same  SPARC  machine  instruction.  The  first  rule  we 
show  here  is  useful  for  compiling  a  source- language  add  for  which  no  dataflow  track¬ 
ing  is  needed  to  prove  safety;  the  second  is  useful  for  compiling  address  arithmetic. 
Having  multiple  LTAL  instructions  for  the  same  machine  instruction  simplifies  type¬ 
checking. 


LRT ;  p;  <f>  h  x  :  int  LRT ;  p;  (p  b  y  :  int 
LRT  h  (p;  0;  cc)  {z  =  x  +  y}  (p;  Jtf]  ( 4>\z ),  z  :  int;  cc) 

In  fact,  this  rule  is  dramatically  simplified  for  clarity.  The  full  version,  which  is 
shown  in  Figure  3.3,  has  ten  premises  and  one  complicated  conclusion. 

The  first  and  second  premises  state  that  both  x  and  y  have  type  int32,  the 
32-bit  integer  type.  The  environment  LRT  is  label,  register  allocation,  and  type 
abbreviation  maps.  Address  i  is  the  location  of  current  instruction  z  =  x  +  y;  i'  is 
the  location  of  the  next  instruction.  Premise  (3)  specifies  that  the  length  of  the  add 
instruction  is  4  bytes. 

Premises  (4)  and  (5)  relate  variables  z  and  x  to  their  temporary  numbers,  and 
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LRT ;  p;  0  b  x  :  int32 

(1) 

LRT;  p;  0  b  y  :  int32 

(2) 

f  =  £  +  4 

(3) 

rmap  (LRT)(z)  =  tz 

(4) 

rmap  (LRT)(x)  =  tx 

(5) 

realreg  (A )  =  rz 

(6) 

realreg  (4)  =  rx 

(7) 

ym  =  rna  t  ch  _r  eg  _o  r  J  m  rri  ( y ) 

(8) 

0'  =  {z  :  int32 }  fl  (4>\z) 

(9) 

decodeJist  l  UP  P'  iJADD(rx,ym,rz) 

(10) 

LRT]  T  b  (£;  p;  JT;  0;  cc;  P){z  =  x  +  y}(£']  p;  ^]  0';  cc;  Pr) 
Figure  3.3:  A  sophisticated  LTAL  type  checking  rule. 


premises  (6)  and  (7)  map  temporaries  to  registers;  this  rule  would  not  be  applicable 
to  operands  represented  in  spill  locations  (but  of  course  that’s  true  of  the  actual 
SPARC  add  instruction  too).  There  are  about  1000  temporaries  (after  register 
allocation);  the  first  64  are  registers  (including  32  floating-point  registers),  and 
the  remainder  are  in  the  spill  area.  The  per-program  rmap — the  R  component 
of  LRT — maps  variables  to  temporaries;  the  program-independent  relations  realreg 
and  memtemp  relate  temporaries  to  their  machine  representation. 

Since  value  y  can  be  either  a  register  or  an  immediate,  we  use  match. jreg-or Jmm 
in  premise  (8)  to  match  either  a  register  or  an  immediate.  So  ym  can  be  either 
(rrnode  ry)  for  some  register  ry  or  (imode  i)  for  some  immediate  i.  Premise  (8) 
matches  a  particular  SPARC  addressing  mode. 

Premise  (9)  states  the  relation  between  the  value  typing  context  before  and  after 
execution  of  the  current  instruction.  Before  we  add  the  type  of  variable  z  into  the 
context,  all  aliases  of  z  should  be  killed  since  they  are  not  live  anymore,  which  is 
what  (j)\z  does.  We  use  intersection  type  to  extend  the  old  typing  environment  0 
with  the  new  binding  z  :  int32-  The  0  context  is  small  (it  just  maps  currently  live 
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local  variables)  and  is  represented  as  a  list,  not  with  dynamic  atomic  clauses  as  will 
be  described  in  Section  5.3.1. 

The  decodeJist  relation  in  premise  (10)  maps  an  instruction  encoding  (i.e.,  an 
integer)  to  its  semantics.  Specifically,  it  says  that  the  instruction  word  at  the  begin¬ 
ning  of  machine  code  P  with  length  t'  —  l  is  an  add  instruction  i_ADD (rx,ym,rv). 
Machine  code  P  is  a  sequence  of  integers  (instruction  words);  the  pair  (P,Pf)  is 
a  conventional  Prolog  difference  list  [Sethi,  1989,  §8.4],  Premise  (10)  will  also  be 
explained  in  the  next  subsection. 

The  conclusion  is  a  Hoare-logic  style  judgement.  Under  environment  LRT,  the 
instruction  z  —  x  +  y  is  at  location  £;  the  length  of  the  instruction  is  £'  —  l\  this 
instruction  does  not  affect  type  contexts  p  or  heap  allocation  environment  value 
context  (j)  becomes  (j)'  after  execution;  the  machine  code  at  location  £'  is  P' . 

For  a  real-life  program,  the  generated  maps  L,  R,  and  T  can  be  very  large:  The 
sizes  of  L  and  R  are  approximately  linear  in  the  size  of  the  program,  and  we  intend 
to  be  able  to  type-check  programs  with  millions  of  instructions.  In  this  typing 
rule,  premises  (4)  and  (5)  look  up  the  temporaries  of  variables  v  and  x  in  map  R ; 
premise  (8)  looks  up  the  temporary  of  y  if  it  is  not  an  immediate.  Therefore,  an 
efficient  environment  management  scheme  is  necessary;  in  our  implementation,  we 
use  dynamic  clauses  to  efficiently  maintain  various  environments.  We  will  further 
discuss  this  issue  in  Section  5.3. 

3.3.1  Instruction  decoding 

The  decodeJist  relation  in  the  premise  (10)  of  the  sophisticated  rule  shown  in  Fig¬ 
ure  3.3  maps  an  instruction  word  to  a  higher-level  instruction  with  semantic  mean¬ 
ing.  Specifically,  it  says  that  the  instruction  word  at  the  beginning  of  the  machine 
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code  P  with  length  £'  —  £  is  an  add  instruction  i_ADD (rx,ym,rz).  We  check  for 
proper  instruction  encoding  with  rules  such  as  the  following: 


10 

z 

000000 

X 

0 

00000000 

V 

32  30  25  19  14  13  5  0 

32  •  2  -| -  Z  —  Xg  64  •  A  9  +  0  =  Xf 

32  ■  Xy  +  X  =  Xg  2  •  Xq  +  0  =  A  4 

256  •  X4  +  0  =  Xi  32  •  X-l  +  Y  =  W 

decode(i_ADD(X,  rmode(Y),  Z),  W) 

This  rule  is  not  an  axiom  of  our  system,  it  is  a  lemma  derived  from  a  more 
concise  and  readable  definition  of  instruction  encodings  [Michael  and  Appel,  2000]. 
The  predicate  A  ■  B  +  C  =  D  shown  here  is  a  simplification  of  an  actual  predicate 
that  also  checks  that  C  <  A  and  that  A,  B,  C,  D  are  natural  numbers. 

3.4  Types 

The  LTAL  type  system  is  very  expressive,  with  support  for  many  advanced  features 
such  as  position-independent  code,  type  definitions,  singleton  types,  and  polymor¬ 
phic  function  types.  The  LTAL  types  are  shown  in  Figure  3.4,  and  we  give  a  brief 
introduction  below: 

Type  variables:  We  use  named  variables  for  presentation  purposes;  however,  in 
our  actual  implementation,  we  use  de  Bruijn  indices  such  as  0.  The  de  Bruijn 
indices  enable  convenient  manipulation  of  (open)  type  terms. 
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Types 
t  ::=  a 
|  T 
JL 

int 

n 

int7r(r) 

range(ri,r2) 

def(^) 

T\  n  r2 
Ti  U  r2 
array  (ti,t2) 

codeptr[a  :  /c](m,  cc,  V\  :  ri, . . . ,  u, 

offset(ri,r2) 

field  (i,  T-) 

sumfYi,  r2) 

hastag(ri,r2) 

3  a.r 

/ja.T 

box 

ref 

addr(/) 

I  diff  (h,l2) 


type  variables 
top  type 
bottom  type 
integer  type 
singleton  integer  type 
refined  integer  type 
range  type 
type  definition 
intersection  type 
union  type 
array  type 

:  rn)  polymorphic  code  pointer  type 

offset  type 
field  type 
sum  type 
hastag  type 
existential  type 
recursive  type 
immutable  reference  type 
mutable  reference  type 
label  type 
label-difference  type 


Figure  3.4:  LTAL  syntax:  Types. 
Top  type:  Any  term  can  have  the  top  type  T. 


Bottom  type:  No  term  can  have  the  bottom  type  _!_.  It  is  useful,  however,  in  some 
situations.  For  example,  we  can  use  sum(r,  _L)  to  represent  a  data  type  with 
no  boxed  cases. 


Integer  types:  The  integer  types  are  bounded.  A  term  of  int32  type  has  a  32-bit 
integer  value. 


Singleton  integer  types:  The  singleton  integer  type  is  written  as  n.  A  term  of 
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type  n  has  an  integer  value  n.  Singleton  integer  types  are  used  in  many  places 
such  as  dataflow  analysis  in  onr  system. 

Refined  integer  types:  LTAL  has  a  set  of  refined  integer  types  int7r(r)  to  express 
relational  constraints,  where  tt  is  a  binary  operator.  For  example,  i nt>  (9)  is  a 
refined  type  and  it  classifies  integers  that  are  greater  than  9. 

Range  types:  Range  type  is  a  syntactic  sugar.  Range  type  range(ri,r2)  is  an 
abbreviation  for  int^iy)  D  int<(r1). 

Type  definitions:  Type  definition  def  is  used  for  concise  type  representation  and 
efficient  type  checking.  The  type  checker  opens  a  type  definition  only  when  it 
is  necessary  to  do  so. 

Intersection  types:  A  term  v  has  an  intersection  type  7y  fl  r2  means  v  has  both 
type  T\  and  type  r2. 

Union  types:  A  term  v  has  a  union  type  ly  U  r2  means  that  v  has  either  type  iy 
or  type  r2. 

Array  types:  Type  array (ti,t2)  describes  an  array  whose  size  is  of  type  iy  and 
whose  elements  are  of  type  r2.  Type  7y  is  expected  to  be  a  singleton  integer 
type  such  as  100. 

Polymorphic  code  pointer  types:  The  type  codeptr[a  :  k]  (m,  cc,  ty  :  Ti, ...  ,vn  : 

Tn )  is  for  polymorphic  code  pointers.  It  takes  a  list  of  type  variables  a  :  R 
(with  their  kinds),  a  type  m  describing  the  available  memory  slots  without 
further  availability  testing,  the  required  condition  code  type,  and  a  list  of 
arguments  and  their  types. 
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Offset  types:  A  term  v  has  type  offset  (i,  r)  means  that  value  v  +  i  has  type  t. 
Offset  types  are  used  in  address  arithmetic. 

Field  types:  Field  types  are  used  for  constructing  types  of  records.  A  record  type 
is  the  intersection  of  field  types  describing  the  record  fields. 

Sum  types:  Sum  types  are  used  to  describe  (user-defined)  data  types.  Sum  types 
have  the  form  sum  (rr,Tu),  where  rr  is  a  range  type  and  tu  is  a  union  type 
of  record  types.  That  is,  the  rr  cases  of  the  data  type  described  by  the  sum 
type  are  not  boxed,  and  the  tu  cases  are  boxed.  See  Section  3.8  for  detailed 
descriptions  of  the  use  of  sum  types. 

Hastag  types:  The  type  hastag(ri,  t^)  refines  a  sum  type  by  requiring  that  the 
value  has  a  tag  at  the  first  field  of  the  record. 

Existential  types:  The  existential  type  3 a.r  is  useful  for  data  abstraction  and 
information  hiding.  In  LTAL,  we  use  it  for  typing  closures,  tagged  sum  values, 
position-independent  code,  etc. 

Recursive  types:  Recursive  types  fxa.T  model  inductively  defined  recursive  types. 

Pointer  types:  Type  boxed  is  for  pointer  values. 

Immutable  reference  types:  The  immutable  reference  type  box(r)  describes  a 
pointer  that  points  to  some  memory  slot  whose  content  is  of  type  r.  The 
memory  slot  to  which  the  pointer  points  is  not  allowed  to  be  written  after 
initialization. 

Mutable  reference  types:  The  ref  type  is  the  mutable  version  of  the  above  ref¬ 


erence  type  box. 
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Values 


X 

variables 

i 

known  integer  value 

l 

label  value 

c{v) 

coercion  value 

vdiff  (h,l2) 

label-difference  value 

Figure  3.5:  LTAL  syntax:  Values. 


Label  types:  Label  type  addr(Z)  describes  a  label.  The  value  of  a  label  is  an 
address  at  which  some  code  pointer  resides. 


Label-difference  types:  The  type  diff (Zi ,  Z2)  describes  a  (compile-time  known) 
value  1 1  —  l2-  This  type  is  used  to  check  address  arithmetic  and  position- 
independent  code  which  we  will  discuss  in  Section  3.9  and  3.10. 


3.5  Values 

Values  are  shown  in  Figure  3.5.  A  value  can  be  a  variable  x,  an  immediate  integer 
i,  a  label  /,  a  coerced  value  c(v)  (where  c  is  a  coercion),  or  a  vdiff  value.  We  use 
variables  to  track  aliases  of  registers.  Different  variables  with  different  types  can 
be  assigned  the  same  register,  indicating  different  views  of  the  same  register  to  the 
type-checker.  The  value  constructor  vdiff  and  type  constructors  addr  and  diff  are 
used  for  address  arithmetic  and  typed  position-independent  code.  Their  meanings 
are  explained  in  Section  3.9  and  3.10. 

Unlike  A-calculus,  function  values  are  not  classified  as  values  in  LTAL.  LTAL  is 
a  typed  calculus  for  low-level  code,  which  has  labels  and  code  blocks.  A  label  value 
in  LTAL  stands  for  the  address  at  which  a  code  block  resides. 

The  typing  rules  for  values  are  quite  straightforward,  as  shown  in  Figure  3.6. 
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4>(x)  =  r 


LRT ;  p\  0  b  x  :  r 


ValVar 


~  — - — — : — : - :=r  ValConstant 

LRT;p](f>  b  i  :  int=(«) 

l[a  :  k] (m,  cc,Vi  :  T\, . . .  ,vn  :  Tn)  —  . . .  is  a  code  block 
LRT ;  p;  0  b  l  :  codeptr[a  :  /c](m,  cc,  V\  :  r, ,  .  . . ,  :  rn) 

ValDiff 


ValLab 


LRT ;  p;  0  b  vdiff  (R,  /2)  :  d iff (Zx ,  /2) 

;  p;  <f>  b  v  :  tv  p\  LRT  bc  rv  r 
LRT ;  p;  (f)  b  c(v)  :  r 


ValCoerce 


Figure  3.6:  Value  typing  rules. 


What  is  worth  mentioning  is  ValLab ,  the  value  typing  rule  of  a  label  value.  We  put 
in  the  premise  informally  that  label  l  is  declared  as  the  label  of  some  code  block.  In 
our  actual  implementation,  we  use  local  assumptions  (or  dynamic  clauses  in  Prolog) 
to  check  that  label  /  is  declared.  The  use  of  dynamic  clauses  enables  efficient  and 
concise  proof  checking;  we  will  explain  this  in  detail  in  Section  5.3. 


3.6  Coercions 

A  coercion  only  changes  the  static  type  of  a  value;  it  has  no  runtime  effect.  A 
coercion  c  defines  a  type  transformation  function  fc.  If  c  is  applied  to  value  v 
of  type  r,  we  get  another  value  c(v)  of  type  fc(r).  Type  r  and  /c(r)  should  be 
compatible;  more  accurately,  it  should  be  provable  that  r  is  a  subtype  of  /c(r). 
Coercions  simplify  type-checking  by  telling  the  checker,  in  effect,  where  to  apply 
subtyping.  However,  this  can  significantly  increase  the  size  of  the  LTAL  code. 
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Coercions 


cid 

identity  coercion 

Ci  o  c2 

composition  coercion 

cfold[r] 

fold 

cunfold 

unfold 

cpack(r1,  r2) 

pack 

cinjt[sum(rr,  tu)\ 

Injection 

csnm2range 

sum  to  range 

csnm2boxedone 

sum  to  boxedone 

csum2  hastag 

sum  to  hastag 

cnnhastag 

hastag  elimination 

c2int32 

int  to  int32 

ci2nz 

singleton  refinement 

crange[ni,  n2] 

singleton  to  range 

cmjl  (r) 

left  injection 

cmj2  (r) 

right  injection 

cproj  1 

left  projection 

cproj2 

right  projection 

cdef  S> 

definition  introduction 

cnarne 

definition  expansion 

cinters(ci,  C2) 

intersection 

cunion(ci,  C2) 

union 

c2inters 

simultaneous  coercion 

cfield  c 

field 

caddr2code 

label  to  code  pointer 

coffsetO 

offset  0  introduction 

coffsetOclim 

offset  0  elimination 

cptappfr] 

partial  instantiation  of  polymorphic  functions 

Figure  3.7:  LTAL  syntax:  Coercions. 

LTAL  are  shown  in  Figure  3.7.  We  briefly  explain  them  below: 


Identity:  The  identity  coercion  cid  coerces  a  type  to  itself. 

Composition:  The  composition  coercions  Ci  o  C2  applies  C2  first,  and  then  applies 


c i  to  that  result. 
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Fold:  The  coercion  cfold[r]  transforms  a  type  To  into  a  recursive  type  r.  The 
recursive  type  r  is  of  the  form  / ia.T\ ,  and  r0  =  Tipr/a]. 

Unfold:  The  unfold  coercion  is  the  opposite  to  the  fold  coercion;  it  unfolds  a  re¬ 
cursive  type  r0  =  i-icx.T i  into  type  r  =  ti[t/o\. 

Pack:  The  coercion  cpack(Ti,T2)  coerces  a  type  t  into  an  existential  type  r-2-  Here 
t-2  is  of  the  form  3a. t3,  and  r  =  t3[ti/«].4  The  opposite  of  pack  coercion 
is  the  instruction  open  (See  Section  3.9),  which  has  no  runtime  effect,  but 
opens  an  existential  type  at  type-checking  time.  Because  open  must  bind  a 
fresh  type  variable,  it  is  not  convenient  to  design  it  as  a  coercion.  In  our 
implementation,  A  =  open(B)  is  a  virtual  instruction  with  no  runtime  effect. 
At  the  type-checking  time,  B  must  be  of  some  existential  type  3.r.  Assume 
the  current  typing  environment  is  0.  We  must  shift  0  before  adding  the  new 
binding  A  :  r  into  the  typing  environment.  The  result  typing  environment  is 
0ffl,H  :  t,  where  ^  is  the  shift  operator  in  the  explicit  substitution  calculus 
[Abadi  et  ah,  1990]. 

Injection:  The  injection  coercion  cinjt[sum(Tr, tu)\  coerces  a  type  tu  into  a  sum 
type  sum (rr,Tu). 

From  sum  to  range:  The  coercion  csum2range  coerces  a  sum  type  sum(Tr,  _L)  to  a 
range  type  rr.  Note  that  the  second  argument  of  the  sum  type  must  be  the 
bottom  type  _L. 

From  sum  to  boxedone:  The  coercion  csum2boxedone  coerces  a  sum  type  sum(_L,  t) 

to  type  t  and  makes  sure  that  t  is  not  a  union  type;  that  is,  there  is  only 

4In  our  actual  implementation,  because  we  use  de  Bruijn  index  representation  of  variable  bind¬ 
ings,  T2  here  is  an  open  type  and  the  result  type  of  the  pack  coercion  is  3.T2. 
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one  case  in  the  boxed  part  of  the  sum  type.  The  compiler  uses  this  one-boxed 
fact  to  optimize  away  the  extra  boxing.  Note  that  the  first  argument  of  the 
sum  type  must  be  the  bottom  type  _L,  stating  that  it  is  boxed  and  there  is  no 
unboxed  case. 

From  sum  to  hastag:  The  coercion  csum2hastag  coerces  a  sum  type  sum(_L,r)  to 
type  3o:.hastag(a:,  r),  or  3.hastag(0,  r^)  in  the  de  Bruijn  index  representation. 
Note  that  the  first  argument  of  the  sum  type  must  be  the  bottom  type  _L.  The 
type  is  resulted  from  shifting  type  r  one  step  in  the  explicit  substitution 
calculus  [Abadi  et  ah,  1990]. 

hastag  elimination:  The  coercion  cunhastag  coerces  a  hastag  type  hastag(rta5,  r) 
into  r  if  r  is  not  a  union  type  and  r/1. 

From  int  to  int32 :  The  coercion  cint2int32  coerces  some  refined  integer  type  (such 
as  int=(l))  or  range  type  (such  as  range(0,2))  into  an  int32  type. 

From  singleton  type  to  nonzero  integer  type:  The  coercion  ci2nz  coerces  a 
singleton  integer  type  to  a  refined  integer  type  int^(0)  if  values  of  the  original 
type  are  not  equal  to  zero. 

From  a  singleton  type  to  a  range  type:  The  coercion  crange[ni,  n2]  coerces  a 
singleton  type  i  to  a  range  type  range(ni,  n2).  The  coerce  rule  checks  that 
ni  <  i  <  n2  holds. 

Injection  left:  The  injection  coercion  cinjl  ( Ti,t2 )  injects  a  type  T\  into  a  union 
type  7"i  U  t2. 
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Injection  right:  The  injection  coercion  cinj2  (ti,  t2)  injects  a  type  T2  into  a  union 
type  7i  U  r2. 

Projection  left:  The  projection  coercion  cprojl  transforms  an  intersection  type 
Ti  D  r2  into  its  first  component,  that  is,  the  type  T\. 

Projection  right:  The  projection  coercion  cproj2  transforms  an  intersection  type 
Ti  fl  r2  into  its  second  component,  that  is,  the  type  r2. 

Definition  introduction:  The  cdef  coerces  a  type  r  into  a  type  definition 
def(f^)  if  Qt  is  defined  as  r. 

Definition  expansion:  The  coercion  cnarne  expands  a  def  type  def(f^)  into  its 
definition  if  type  definition  is  defined.  The  LTAL  type  checker  only  expands 
a  type  definition  when  necessary. 

Intersection  coercion:  The  coercion  cinters(ci,  c2)  coerces  a  type  of  the  form  T\  fl 
t2  into  t[  fl  T2  if  Ci  coerces  T\  into  t[  and  c2  coerces  r2  into  re¬ 
union  coercion:  The  coercion  cunion(ci,  c2)  is  similar  to  cinters(ci,  c2)  except 
that  it  coerces  a  union  type  instead  of  an  intersection  type. 

From  the  same  type  to  intersection  type:  The  coercion  c2inters  coerces  a  type 
r  into  T\  fl  r2  if  cy  coerces  r  into  T\  and  c2  coerces  r  into  r2. 

Field  coercion:  The  coercion  cfield  c  coerces  a  field  type  field (7"*,  7")  into  another 
field  type  field  (7y ,  7“')  if  c  coerces  r  into  r' . 

From  a  label  type  addr  to  a  codeptr  type:  The  coercion  caddr2code  coerces  a 
label  type  addr(Z)  into  the  label’s  code  pointer  type  if  the  label  is  declared  in 
the  actual  code. 
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Introduction  of  offset  type  with  offset  zero:  The  coercion  coffsetO  adds  prefix 
offset  0  to  a  type. 

Elimination  of  offset  type  with  offset  zero:  This  coercion  removes  the  prefix 
offset  0  from  a  type.  It  is  opposite  to  the  previous  one. 

Partial  instantiation  of  polymorphic  functions:  The  coercion  cptapp[r]  par¬ 
tially  instantiates  a  polymorphic  function,  with  the  first  type  variable  substi¬ 
tuted  with  type  r. 

Some  of  the  coercion  rules  are  shown  in  Figure  3.8.  See  Appendix  A.l  for  the 
complete  set  of  coercion  rules.  The  coercion  typing  judgement  p;  LRT  hc  r  r' 
means  that  under  the  kind  environment  p  and  maps  LRT,  coercion  c  changes  type 
r  to  t'  and  r'  must  be  a  subtype  of  r. 

If  value  v  of  type  T\  is  used  in  a  place  requiring  type  r2,  the  compiler  has  to  insert 
a  coercion  cTliT2  that  transforms  T\  to  r2  explicitly.  Thus  the  choice  of  sub  typing 
rules  is  made  explicit  and  LTAL  needs  no  subtyping  rules.  Also,  coercions  make 
type  equivalence  rules  unnecessary  because  two  equivalent  types  can  also  be  coerced 
to  each  other.  Two  types  in  LTAL  are  equivalent  if  and  only  if  they  are  exactly  the 
same. 

Sometimes  after  applying  a  coercion  we  need  to  use  the  value  both  at  its  old 
type  and  its  new  type.  This  has  been  a  difficulty  in  some  previous  TALs,  which 
assign  types  to  registers:  They  have  to  emit  a  mov  instruction  to  handle  this  case. 

We  solve  this  problem  by  assigning  types  to  variables,  not  to  registers:  A  variable 
has  only  one  type,  but  different  variables  can  be  assigned  the  same  register.  A 
move-with-coercion  creates  a  new  variable  (in  the  same  register)  without  executing 
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cinjl  [T1UT2I 

p;  LRT  \~c  Ti  Ti  U  r2 


CoercelnjectionLeft 


T  DT  U  Chlj2  [TlUT2]  I  I 

p;  LR1  r  c  r2  ^  Ti  U  r2 


CoercelnjectionRight 


t'  =  r[/za.r/a 


.  cfoldfua.rl 

p;  LRT  hc  r  pa.r 


CoerceFold 


p;  hc  pa  :  k.t  c- °  r[pa  :  K.r/a] 


Coerce  Unfold 


T\  :  n 


.  ,  ,  cpack[n,3Q::re.T2l 

p;  LRT  hc  T2[Ti/a]  ^  3a  :  k.t2 


CoercePack 


T  nrT1  cinjection(sum(rr,T„)) 

p;  LRT  \~ctu  ^  sum(rr,ru) 


CoerceSumlnjection 


p;  LiLT  t~c  r  r'  p;  LAT  he  r'  ^  r" 

r  T)rT1  I  C1OC2  n 

p\  LRT  rc  T  T 


CoerceComposition 


p;  LRT  bc  n  A  r[  p;  LRT  bc  r2  ^ 


p;  LRT  bc  Ti  U  r2 


cunion(ci,C2) 

r  \  7  /-i-/  I  I 


CoerceUnion 


Ti  Ur2 


Figure  3.8:  Selected  LTAL  coercion  rules. 


an  instruction.  In  effect,  the  variable  name  in  an  LTAL  instruction  tells  the  checker 
which  type  to  use. 

This  means  that  when  we  “kill”  a  variable  (by  assigning  a  new  value  to  its 
underlying  register),  we  must  also  kill  all  the  other  variables  bound  to  that  register. 
When  adding  a  new  type  binding  v  :  r,  we  examine  each  binding  v'  :  r'  in  cp  and 
remove  it  from  (p  if  v'  is  assigned  the  same  register  as  v,  which  means  v'  should  be 
no  longer  live.  We  use  (cp\v),v  :  r  to  represent  this  operation;  it  can  be  seen  in 
the  premise  (9)  of  the  big  rule  in  Section  3.3.  When  there  is  no  ambiguity,  it  is 
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abbreviated  to  0,  v  :  r.  On  the  other  hand,  a  move-with-coercions  such  as  v  =  c(v') 
does  not  require  the  application  of  the  \v  operator;  other  aliases  of  v  continue  to 
be  active. 


3.7  User-Defined  Datatypes 

LTAL’s  low-level  type  constructors  provide  support  for  various  data  representa¬ 
tions,  and  extracting  and  checking  tags.  The  type-checker  can  check  the  connection 
between  a  sum  value  and  its  tag,  and  refine  the  type  of  sum  values  after  tag- 
checking.  We  provide  flexibility  for  the  compiler  writer  to  choose  her  preferred  style 
of  datatype  representation;  the  representations  we  describe  in  this  section  are  not 
new,  but  the  point  is  that  we  can  type  each  aspect  of  their  construction  and  decon¬ 
struction.  Chen  [2004,  Chapter  5]  presents  a  more  detailed  discussion  of  data  type 
representations  and  their  type  checking  in  LTAL  and  some  other  TAL  variants. 

For  simplicity,  we  use  the  notation  [r0,  Ti, . . . ,  rn_i]  for  tuple  types  and  use  the 
following  two  type  macros: 


•  Type  range(ni,n2)  for  type  (int>(nl))  ft  (int<(n2)).  A  sum  type  is  often  rep¬ 
resented  as  range(0,n)  U  t.  The  number  n  indicates  the  number  of  constant 
constructors,  which  are  represented  as  integer  0, 1, ...  ,  n  —  1.  Type  t  is  the 
union  of  types  for  the  boxed  constructors. 

•  Type  hastag(ria3,  r)  for  (f ield (0,  Ttag))  Dr.  It  means  that  the  tag  of  a  sum  value 
has  type  Ttag,  and  the  sum  value  is  of  type  r. 
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intlistl  intlist2  intlist3  intlist4 

Figure  3.9:  Datatype  representations  in  LTAL.  The  boxed  values  are  represented 
as  Q 

3.7.1  Datatype  representation 

The  compiler  can  choose  from  different  data  representations  for  user-defined  datatypes 
such  as  intlist: 

datatype  intlist  =  Nil  j  Cons  of  int  *  intlist 
Figure  3.9  shows  four  kinds  of  data  representations  of  the  above  intlist  datatype: 

1.  The  most  straightforward  representation  is  to  tag  each  constructor  with  a 
small  integer:  Nil  is  tagged  0,  and  Cons  tagged  1.  In  LTAL,  this  representa¬ 
tion  is  expressed  as  the  following  type: 

intlisti  =  /nx([int=(0)]  U  [int=  (1) ,  [int,  a]]) 

This  is  a  recursive  type,  whose  body  is  a  sum  type.  A  sum  type  is  represented 
as  union  of  a  range  type  and  a  tuple  type.  The  range  type  represents  the 
unboxed  cases  and  the  tuple  type  represents  the  boxed  cases. 
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2.  We  assume  that  small  integers  can  be  distinguished  from  pointers,  thus  con¬ 
stant  data  constructors  can  be  represented  as  small  integers:  Nil  is  represented 
as  integer  0;  Cons  is  a  boxed  record  with  tag  0.  In  LTAL,  this  representation 
is  expressed  as  the  following  type: 

intlist2  =  /uo:.(range(0, 1)  U  [int=(0),  [int,  a]]) 

3.  In  the  data  representation  of  the  Cons  case  in  intlisti  and  intlist2,  there  are  two 
layers  of  boxing,  one  for  tags  and  one  for  actual  user  data.  We  can  optimize  the 
representation  so  that  only  one  boxing  is  need.  In  LTAL,  this  representation 
is  expressed  as  the  following  type: 

intlist3  =  /ua.(range(0, 1)  U  [int=(0),  int,  a]) 

4.  A  datatype  with  only  one  value-carrying  constructor  can  be  optimized  further. 
It  need  not  be  tagged  since  there  is  only  one  boxed  case.  In  LTAL,  this 
representation  is  expressed  as  the  following  type: 

intlisti  =  /ua.(range(0, 1)  U  [int,  «]) 


The  intlisti  representation  is  specially  optimized  for  the  data  types  with  only 
one  boxed  case.  If  there  are  multiple  boxed  cases,  the  tag  held  cannot  be  omitted 
and  the  intlist3  representation  could  be  used,  as  the  representation  of  the  datatype 
example  in  Section  3.7.3  shows. 
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3.7.2  Creating  sum  values 

We  create  an  empty  list  of  intlisti  by  building  a  1-element  record  vq  =  [0],  then 
coercing  it  to  type  intlisti: 


LTAL 

SPARC 

(assume  vq  :  [int=  (0)] ) 

vi  =  cinjl  ( [int=  (0)]  U  [int=(T),  [int,  intlisti]])(u0) 

V'2  =  cfold[intlisti](ni) 

The  only  difference  between  v0 ,  v±,  and  v2  is  types.  They  have  different  types  cre¬ 
ated  by  coercions,  but  they  are  assigned  the  same  register,  so  no  SPARC  instruction 
is  emitted  for  the  above  LTAL  instructions. 

By  inserting  coercions,  the  type-checker  can  easily  tell  that  value  Vo  can  be 
coerced  to  be  of  type  intlisti.  In  the  first  step,  it  simply  checks  if  the  type  of  v0  is 
the  first  part  of  union  type  [int=(0)]  U  [int=  (1) ,  [int,  intlisti]]  (by  the  rule  of  coercion 
cinjl).  After  this  step,  the  type  of  V\  is  [int=  (0)]  U  [int=(l),  [int,  intlisti]].  In  the 
second  step,  if  the  type  of  v\  is  exactly  the  same  as  intlisti  with  type  variable  a 
replaced  with  intlisti  (coercion  cfold),  the  type  of  v\  is  coerced  into  intlisti  (the 
result  type  of  v2)- 

The  following  two  LTAL  instructions  create  an  empty  list  of  intlist4  by  coercing 
integer  0  to  be  of  type  intlist4. 


V\  =  Grange  [0, 1](0) 

raov  0,  d\ 

v2  =  cinjl  (range(0, 1)  U  [int,  intlist4])(ni) 

V3  =  cfold[intlist4](u2) 

Coercion  crangefni, n2]  changes  a  value  of  type  int=(n)  to  type  range(n4, 712) 
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if  rii  <  n  <  n-2 .  In  the  first  instruction  the  type-checker  only  needs  to  check  if 
0  <  0  <  1  holds. 


3.7.3  Eliminating  sum  values 

Consider  what  happens  when  doing  case  discrimination  on  a  boxed-tag  style  of 
sum  type  representation,  such  as  is  used  when  there  are  multiple  value-carrying 
constructors.  Given  a  value  x  of  sum  type,  one  fetches  its  tag  into  a  variable  y , 
then  does  a  conditional  branch  on  y\  at  this  point,  the  difficulty  is  in  relating  the 
outcome  of  the  conditional  branch  to  the  refined  type  of  x.  One  solution  is  to 
use  a  “macro”  TAL  instruction  to  code  for  the  load-compare-branch  instruction 
sequence.  We  wanted  to  avoid  all  such  macro  instructions  since  they  hinder  some 
compiler  optimizations  such  as  instruction  scheduling.  We  use  type  quantification 
and  singleton  types  to  keep  track  of  the  implicit  dataflow. 

Consider  the  following  user-defined  datatype 

datatype  T  =  A  j  B  j  C  of  int  |  D  of  int  *  T 
which  can  be  represented  in  LTAL  as: 

T  =  /io:.(range(0,  2)  U  [int=  (0) ,  int]  U  [int=(l),  int,  a]). 

This  representation  is  shown  pictorially  in  Figure  3.10.  Since  the  datatype  T  has 
two  value-carrying  constructors  ( C  and  D),  the  tag  held  cannot  be  saved.  This 
representation  is  similar  to  the  intlista  representation  showed  before. 
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0 

1 

0 

1 

int 

int 

T 

A 

B 

C 

D 

Figure  3.10:  LTAL  datatype  representation  example.  The  boxed  values  are  repre¬ 
sented  as  0. 


LTAL 

SPARC 

Vq  =  cunfold  (v) 

(/3,  Vq )  =  testbox(v0) 
ifboxed  (v'0)  then  ( Vi,Icd )  else  (v2  Jab) 
Iab  ■  •  •  • 

lCD  ■  («i,u3)  =  open(ui) 

t  =  gettag(v3,  0) 
cmpcc(t,  0) 

iftag  (=)  {u3}  then  (■ v'3 ,  lc)  else  (v",  lD) 

Id  : 

lc-  ••• 

subcc  d,  256 
bge  lCD 

Iab  :  •  •  • 
lCD  '■ 

Id  [d\,dt 
subcc  dt,  0,  %g0 
be  lc 

Id  : 

lc'  ••• 

Figure  3.11:  Datatype  tag  discrimination  example.  (Variables  v0,  v,  v'0,  V\,  v2,  v3, 
v3,  and  v3  are  all  assigned  register  d,  and  variable  t  is  assigned  register  dt .) 


“Switching”  on  sum  values  in  source  program 


case(u  :  T)  of  A  e\ 

\  B  =>  eB 
|  C(x)  ec 
D(x,y)^eD 


is  translated  to  the  LTAL  and  SPARC  instruction  sequence  in  Figure  3.11. 
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We  need  to  generate  code  that  tests  v  to  decide  which  branch  to  take.  Each  test 
and  each  branch  should  be  an  explicit  LTAL  instruction.  We  first  test  whether  v  is 
boxed  or  not.  From  our  assumption  that  no  pointers  point  to  the  first  256  words  in 
the  memory,  if  v  is  a  small  integer  (less  than  256),  then  it  is  unboxed.  That  is,  it  is 
either  A  or  B.  Otherwise  it  is  C  or  D. 

The  type  checking  rules  used  for  datatype  tag  discrimination  are  shown  in  Fig¬ 
ure  3.12.  Instruction  testbox  performs  this  test  and  sets  condition  codes.  Instruc¬ 
tion  ifboxed  examines  the  condition  codes  and  rebinds  two  fresh  variables  v\  and 
v2  with  refined  types  for  boxed  and  unboxed  cases,  respectively.  Variable  V\  has 
type  3a.hastag(a,  [int=  (0) ,  int]  U  [int=(l),  int.  r]),  which  means  it  is  tagged  (we  do 
not  know  the  tag  yet).  Variable  v2  has  type  range(0,2),  which  means  it  is  either  0 
or  1.  Both  V\  and  v2  are  forced  to  be  assigned  the  same  register  as  vQ,  so  no  machine 
instruction  is  needed  to  move  v0  to  V\  or  v2. 

In  the  unboxed  case,  we  further  test  if  v2  is  0  or  1,  which  is  easy.  In  the  boxed 
case,  we  need  to  test  the  tag  of  v\.  Variable  v\  hides  the  type  of  its  tag  by  existential 
types.  We  first  open  v\  to  v3  and  bind  a  brand  new  type  variable  op.  Again,  no 
SPARC  instruction  is  needed  because  V\  and  v3  are  assigned  the  same  register. 
Variable  v3  has  type  hastag(ai,  [int=(0),  int]  U  [int= (1) ,  int,  r]). 

Instruction  gettag  extracts  the  tag  t  and  gives  it  type  int=(ai).  Then  cmpcc 
checks  if  tag  t  is  0  and  set  condition-code  environment  to  be  cc_cmp(o:i,  0).  Instruc¬ 
tion  iftag  checks  condition  codes  set  by  cmpcc,  rebinds  two  new  variables  v3  and 
v3  as  aliases  of  v3  and  does  conditional  branch.  Specifically,  the  type  checking  rule 
of  the  iftag  instruction  checks  that:  cc  is  cc_cmp(r0,  0),  v3  is  of  type  hastag(rg,  r), 
and  r0  =  Tg;  in  this  example,  both  r0  and  Tg  are  ot\.  Then  it  refines  the  types  of  v3 
and  v3  to  [int= (0) ,  int]  and  [int=(l),  int,  r],  respectively.  This  refinement  rules  out 
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LRT]p\(f)  b  v  :  t  ft  =  (ft\v):v'  :  int=(a)  fir  cc'  =  cc_testbox(o;) 
LRT  b  (p;  0;  cc)  {(a,  i/)  =  testbox(v)}  ( p ;  ft]  cc') 


InstrTestbox 


LRT ;  p;  <f>  b  u  :  int=(rQ)  fl  (int=(0)  U  . . .  U  int=(n  —  1)  U  r') 

/  =  Tj  U  r2  U  . . .  U  rra 
cc  =  cc_testbox(rQ)  n  <  256 
Tj  =  (field (0,  int=(tapj)))  fl  t'  (for  all  1  <  i  <  m) 

LRT ;  p;  0,  Ci  :  r';  cc  b^  b 
LRT ;  p;  (j),  v2  :  range(0,  n);  cc  b<?  /2 

LRT  b  (p;  0;  cc)  {ifboxed  (v)  then  (vi,  b)  else  (v2,  l2)}  (_;  _) 


Instrlfboxed 


LRT ;  p;  <f>  b  v  :  3a  :  k.t 

Li?T  b  (p;  0;  cc)  {(a,  c0)  =  open(u)}  (p,  a  :  k;  (f),  v0  :  r;  cc) 


InstrOpen 


LRT ;  p;  <f>  b  ?/  :  hastag(rtoff,  ru)  ft  =  ( <j>\v),v  :  int=(rtoff) 

LRT  b  (p;  JT;  0;  cc)  {/e  =  gettag(</)}  (p;  ft]  cc)  lnstrGetta9 


LRT p;4>  b  t>i  :  int=(ri)  LRT ;  p;  0  b  v2  :  int=(r2) 

LRT  b  (p;  0;  cc)  (cmpcc(vi,  u2)}  (p;  0;  cc_cmp(ri,  r2)) 


InstrCmpcc 


LRT]  p;  (f)  b  v  :  hastag^rQ,  r„) 
cc  =  cc_cmp(rQ,  i) 
t  =  Ti  U  r2  U  . . .  U  rn 

Tj  =  field (0,  int=(ta<p))  fl  r/  (for  all  1  <  i  <  n) 

Tt  =  Ul<2<n  Tj  where  i  n  tagj  holds 
Tf  =  Ui<fc<nrfc  where  i  n  tagk  does  not  hold 
LRT  ;  p;  0,  V\  :  (field  (0,  int=(ra)))  fl  rt;  cc  b^  b 

LRT]  p;  0,  v2  :  (field(0,  int=(ra)))  fir/;  cc  b£ /2 

LRT  b  (p;  Jf';  0;  cc)  (iffcag  (7r)  {v}  then  bh,  b)  else  (n2,  /2)}  (_;  _)  UStr 


Figure  3.12:  Rules  for  datatype  tag  discrimination. 
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disjuncts  by  the  result  of  comparing  tags  with  integers.  A  constraint  solver  as  in 
DTAL  [Xi  and  Harper,  2001]  is  overkill  for  our  purpose. 

The  connection  between  a  tagged  value  and  its  tag  is  established  by  existential 
types,  since  every  time  we  open  a  variable  of  type  3a.hastag(o;,  r)  and  assign  it  to 
some  variable  v,  we  get  a  fresh  type  variable  a'  (a,\  in  the  above  example),  and  only 
u’s  type  contains  the  new  type  variable  a!  in  the  first  conjunct  (field (0,  a;')),  and 
only  by  instruction  gettag(v,  0)  can  we  get  a  variable  of  type  a'. 

For  simplicity  we  use  linear  search  here.  LTAL  also  permits  binary  search;  to 
do  an  indexed  jump  we  would  need  to  extend  LTAL,  but  our  underlying  semantic 
model  will  permit  this  in  a  modular  way. 


3.8  Heap  Allocation 

In  this  section  we  briefly  present  the  heap  allocation  model  used  in  LTAL  and  the 
FPCC/ML  compiler.  Chen  [2004,  Chapter  4]  gives  a  more  detailed  discussion  of  the 
model,  including  record  allocation,  known-  and  unknown-length  array  allocation, 
and  their  type  checking. 

Like  SML/NJ,  our  compiler  allocates  closures  and  records  in  registers  or  on  the 
heap;  we  don’t  push  and  pop  the  stack.  At  present,  our  type  system  (like  most 
TALs)  also  does  not  accommodate  reasoning  about  garbage  collection  either.  We 
intend  to  handle  stacks  and  GC  in  the  future,  after  we  develop  a  unified  theory  of 
stack  and  heap  deallocation  (probably  based  on  a  region  calculus). 

As  in  SML/NJ,  with  so  much  heap  allocation  we  need  extremely  efficient,  in-line 
allocation  of  records.  We  model  the  allocable  heap  memory  as  a  large  contiguous 
region  bounded  by  two  pointers,  allocptr  and  limitptr.  Heap  allocation  is  broken 
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allocptr 


limitptr 


~r 

4096 


CHUNK  1 


CHUNK  3 


Figure  3.13:  SML/NJ  Heap  allocation  model. 

into  two  steps:  first,  test  whether  there  is  enough  memory  for  allocation;  second, 
initialize  memory. 

The  heap  allocation  model  is  shown  in  Figure  3.13.  Before  the  runtime  system 
starts  executing  a  program,  it  reserves  a  chunk  of  memory,  and  sets  the  allocptr 
to  the  lowest  address  of  the  memory  chunk,  and  the  limitptr  the  highest  address 
(minus  a  constant  C  =  4096).  When  the  program  needs  n  memory  words,  where 
An  <  C,  it  tests  whether  allocptr  <  limitptr ;  if  so,  then  at  least  n  words  must 
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LTAL 

SPARC 

lo  :  testmem( 3) 

lo  :  subcc  allocptr,  limitptr,  %g 0 

iffull  then  G  else  /2 

bg  h 

l2  :  init(0,  v0) 

l2  :  st  d0,  [allocptr  +  0] 

init(  1,  Ui) 

st  d{,  [allocptr  +  4] 

init(  2,v2) 

st  d2,  [allocptr  +  8] 

v  =  record 

mov  allocptr ,  d 

inc_allocptr(3) 

add  allocptr,  12,  allocptr 

Iff.  ... 

U  :  ... 

Figure  3.14:  Heap  allocation  example. 

be  available.  Then  it  fills  in  n  words  consecutively  to  addresses  from  allocptr  to 
allocptr  +  An  —  4,  then  increases  allocptr  by  An. 

The  LTAL  instruction  sequence  in  Figure  3.14  creates  a  3-ficld  record  [vo,Vi,v2] 
and  assigns  it  to  v.  The  corresponding  SPARC  instructions  are  on  the  right  side  of 
the  table  (d,  do,  d\,  d 2  are  registers  assigned  to  LTAL  variables  v ,  vq,  v\,  V2). 

Block  Iq  tests  if  there  are  at  least  3  words  in  the  memory  for  allocation;  after  the 
testmem  comparison  the  condition-code  environment  is  cc_testmem(3).  Then  the 
branch  instruction  iffull  “consumes”  this  condition  code,  and  statically  guarantees 
3  words  in  the  fall-through  case  (memory  is  not  full). 

Block  l2  initializes  the  three  newly  allocated  words.  Instruction  init(i,  i\)  ini¬ 
tializes  the  word  whose  address  is  allocptr  +  Ai  with  Vi.  Instruction  v  =  record 
copies  allocptr  to  v  and  v  gets  the  record  type.  Instruction  incaallocptr(n)  increases 
allocptr  by  An. 

The  instruction  sequence  for  allocation  is  not  fixed.  The  instruction  scheduler 
can  shuffle  these  instructions  with  others,  as  long  as  certain  constraints  hold. 


An  allocation  environment  JY  is  used  to  check  heap  allocation.  It  consists  of 


CHAPTER  3.  LOW-LEVEL  TYPED  ASSEMBLY  LANGUAGE 


56 


_ 0  <  n  <  1024 _ 

LRT  b  (p;  ■3Y> ;  0;  cc )  (testmem(n)}  (p;  J^7;  0;  cc_testmem(n)) 


InstrTestmem 


cc  =  cc_testmem(n)  LRT ;  p;  0;  cc  b^ 

LRT ;  p;  (n,  -1,  T);  0;  cc  b£  Z2 
Li?T  b  (p;  0;  cc)  {iffull  then  R  else  /2}  (_;  -) 


Instrlffull 


LRT ;  p;  0  b  Uj  :  int=(i)  0  <  i  <  n  m!  —  max(m,  i ) 

;  p;  0  b  v  :  R  t'  =  t  D  (field (4i,  Z0) 

LRT  b  (p;  (n,  m,  t);  0;  cc)  {init(vj,  v)}  (p;  (n,  m',  t')]  0;  cc) 


Instrlnit 


LRT  b  (p;  (n,  m,  i);  0;  cc)  {t> 


record}  (p;  J^7;  0,  v  :  t]  cc) 


InstrRecord 


LRT ;  p;  0  b  v  :  int=(n/)  m  <  n'  <  n 


LRT  b  (p;  (n,  m,  t)\  0;  cc_testmem(A;)) 
{ inc-allocptr(v )} 

(p;  (n  —  n',  —1,  T);  0;  cc_none) 


Ins  trine  A  llocptrl 


LRT ;  p;  0  b  v  :  int=(n  )  m  <  n  <  n  cc  V  cc_testmem(/c) 

- - - -  InstrIncAllocptr2 

LRT  b  (p;  (n,  m,  t);  0;  cc) 

{inc-cillocptr(v)} 

(p;  (n-n',-l,T);0;  cc) 

Figure  3.15:  Rules  for  allocation  instructions. 


three  parts:  the  number  of  words  that  are  guaranteed  to  be  available  in  the  memory, 
the  largest  index  of  initialized  fields,  and  the  type  of  the  partial  record  initialized  so 
far.  We  don’t  need  the  initialization  flags  used  in  TALx86  [Morrisett  et  ah,  1999a]. 

The  typing  rules  for  the  allocation  instructions  are  shown  in  Figure  3.15.  The 
judgement  LRT]  p;  0;  cc  b^  l  states  that  the  signature  of  block  l  matches  the 
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current  environment;  for  heap  allocation,  the  environment  Rtf'  must  hold.  If  this 
judgement  holds,  it  is  safe  to  jump  to  block  /,  and  safe  to  allocate  certain  size  of 
memory  (without  out-of-heap  testing)  specified  by  the  environment  .  Instructions 
testmem  and  iffull  establish  the  allocation  environment  in  which  the  init  instructions 
type-check.  The  compiler  can  (and  does)  optimize  by  making  one  iffull  cover  the 
sequential  allocation  of  several  different  records  in  a  control-flow  path  that  covers 
several  basic  blocks.  The  parameter  m  of  codeptr  conveys  the  necessary  information 
about  how  much  memory  is  guaranteed  to  remain. 

A  tuple  type  [r0,  Ti, . . . ,  rn_ i]  is  represented  in  LTAL  as 

(field (0,  r0))  fl  (field(l,  n))  n  . . .  n  (field ((tt.  -  1),  r„_  1)). 

If  v  has  this  type,  then  the  word  located  at  memory  address  v  has  type  r0,  at  address 
v  +  4  type  t \ ,  etc.  (assuming  the  word  size  is  4).  When  a  held  is  initialized  by  a  init 
instruction,  one  more  conjunct  (a  field  type)  is  added  into  the  type  of  the  partial 
record  in  the  allocation  environment. 

After  initialization,  the  allocptr  is  copied  to  a  variable  (with  record  type)  by 
instruction  v  =  record ,  and  then  the  allocptr  is  adjusted  to  point  to  the  next 
available  memory  word  by  instruction  incmllocptr.  After  instruction  inc_allocptr, 
the  condition  codes  set  by  testmem  are  invalid  because  allocptr  has  been  changed. 
So  we  reset  the  condition-code  environment  if  it  is  testmem. 

In  the  above  example,  3^  is  (3,  — 1,  T)  when  checking  function  G-  The  number  3 
means  I2  needs  3  words  in  the  heap;  the  second  number  —1  means  no  fields  has  been 
initialized;  the  type  T  means  none  of  the  3  words  is  initialized.  The  environment 
becomes  (3,  0,  field (0,  t0))  after  instruction  init( 0,  vq),  (3, 1,  (field (0,  to))n(field (4,  t\))) 
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after  instruction  init( 4,vi),  and  (3,2,  (field (0,  to))  H  (field (4,  ti))  D  (field(8,  ^2)))  after 
instruction  init( 8,  V2),  where  to,  ti,  and  1 2  are  the  types  of  Vo,  Vi,  and  V2,  respectively. 
Variable  v  gets  type  (field (0,  to))  H  (field (4,  ti))  n  (field (8,  t2)),  which  is  the  third  part 
of  Jt?  at  this  point,  after  instruction  v  =  record.  Instruction  inc-allocptr(3)  clears 
Jf  to  be  (0,-1,  T). 

3.9  Instructions 

LTAL  has  a  number  of  instructions  to  allow  efficient  type  checking  of  heap  allocation 
processes,  position-independent  code,  user-defined  data  type  tag  discrimination, 
condition  codes,  and  polymorphic  code  blocks.  The  LTAL  instructions  are  listed  in 
Figure  3.16. 

We  briefly  explain  their  informal  meanings  and  type  checking  below: 

open:  The  open  instruction  has  no  runtime  effect.  The  compiler  assigns  the  old 
and  new  variables  to  the  same  register,  and  only  the  type  is  changed.  It  is 
opposite  to  the  cpack  coercion. 

move:  The  statement  v  =  v'  is  a  move  instruction  if  the  registers  assigned  to 
variables  v  and  v'  are  different.  If  v  and  v'  are  assigned  to  the  same  register, 
it  has  no  runtime  effect,  but  copies  the  type  of  v'  to  v. 

ALU  instructions:  LTAL  has  a  set  of  standard  ALU  instructions  such  as  addition, 
subtraction,  and  multiplication. 

sethi:  If  an  integer  is  too  big  to  fit  in  an  instruction  as  the  immediate  operand  field, 
the  sethi  is  used  to  set  the  high  bits  first.  The  semantics  of  instruction  sethi 
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Instructions 

l  :: 

=  (a,v')  —  open(v) 

no  instruction 

v’  =  V 

move ,  or  nop 

V  =  V\  Op  Vr2 

ALU  instructions 

v  =  sethi(n) 

sethi 

v  =  load(vi) 

load, 

v  =  store(vi) 

store 

v  =  addradd(v1,v2) 

add 

v  =  select(v  i,  v2) 

load 

v  =  gettag(v) 

load 

init(vi ,  v) 

store 

v  =  record 

move 

inc_allocptr(v) 

add 

call(v,  [ti,  . . . ,  rn]) 

jump 

callnil ,  [ri, . . . ,  t„\) 

fall  through 

* 

cmpcc{v  i,v2) 

subcc 

k 

(a,v[)  =  cmpcci(v  i,v2) 

subcc 

k 

(a,v')  =  testbox(v ) 

subcc 

k 

testmem(n) 

subcc 

k 

if  (i r)  then  R  else  l2 

branch 

k 

iffull  then  R  else  l2 

branch 

k 

ifboxed  (v)  then  (i’i,R)  else  ( v2,l2 ) 

branch 

k 

ifboxedone  ( v )  then  (v\,R 

)  else  (v2,l2) 

branch 

k 

iftag  (7r)  {u}  then  (vi,R) 

else  (v2 ,  l2) 

branch 

Figure  3.16:  LTAL  syntax:  Instructions.  Marked  *  operators  are  specific  to  ma¬ 
chines  with  condition  codes. 


is  the  same  as  SPARC  sethi  instruction.  The  instruction  v  =  sethi(n)  zeroes 
the  least  significant  10  bits  of  variable  u’s  register,  and  puts  the  immediate  n 
in  the  high  22  bits. 


load:  The  LTAL  instruction  v  =  load(vi)  maps  to  a  SPARC  load  instruction.  It 
loads  V\  (in  memory)  into  v  (in  register).  In  the  new  typing  environment, 
variable  v  gets  Ui’s  type  and  bindings  whose  variables  are  assigned  in  the 


same  register  as  v  are  killed. 
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store:  The  LTAL  instruction  v  =  store(v i)  maps  to  a  SPARC  store  instruction. 
It  stores  v  (in  register)  into  V\  (in  memory).  The  type  checking  rule  is  very 
similar  to  that  of  load  instruction. 

addradd:  The  addradd  is  for  address  arithmetic.  It  is  used  for  type  checking 
position-independent  code.  The  instruction  v  =  addradd(vi,  v2 )  assigns  vi+v2 
to  v,  where  V\  is  a  label  value  and  v2  is  a  vdiff  value.  Operand  V\  is  of  label 
type  add r(Z)  for  some  label  l.  Operand  v2  is  a  known  integer,  which  is  the 
difference  between  two  labels.  The  type  of  v2  is  diff(Z!,  Z2)  for  some  labels  l\ 
and  l2,  and  the  value  of  v2  is  l\  —  l2  which  is  a  compile-time  known  integer. 
For  type  checking  position-independent  code,  V\  is  usually  the  base  label,  and 
v2  is  the  offset  of  a  label  from  the  base.  See  Section  3.10  for  the  details  of 
position-independent  code  type  checking. 

select:  The  select  statement  corresponds  to  a  memory  load  machine  instruction. 
It  loads  a  record  field. 

gettag:  The  gettag  statement  also  corresponds  to  a  memory  load  machine  instruc¬ 
tion,  but  it  loads  a  tag  for  some  sum  data  type.  For  example,  the  instruc¬ 
tion  A  <—  gettag(B)  loads  the  tag  of  B  into  A,  where  B  should  have  type 
hastag (rtag,Tu),  for  some  tag  type  rtag  and  union  type  tu.  The  result  type  of 
A  is  int=(Ttoff). 

init:  The  init  is  used  to  initialize  a  record  field,  and  it  maps  to  a  memory  store 
instruction. 

record:  The  instruction  v  <—  record  moves  allocptr ,  which  resides  in  a  dedicated 
register,  to  variable  v,  whose  type  will  be  a  record  type.  The  record  instruction 
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maps  to  a  memory  store  instruction  if  variable  v  is  a  memory  location,  and 
to  a  bitwise  OR  instruction  if  variable  v  resides  in  a  register. 

inc_allocptr:  The  instruction  inc_allocptr(i )  increment  the  alloc  pointer  by  i,  and 
invalidates  the  condition  code  environment  cc_testmem. 

call:  The  instruction  call[a](v)  maps  to  a  SPARC  ba  (branch  aways)  instruction 
or  jmpl  (jump  and  link)  instruction  depending  on  whether  v  is  a  variable 
residing  in  some  register  or  a  label  value.  The  substitution  a  is  applied  to  the 
environments  for  type  checking. 

calln:  The  instruction  calln  is  fall-through.  It  maps  to  no  SPARC  instructions. 
The  target  label  must  be  at  the  same  address  as  the  calln  instruction. 

cmpcc :  The  cmpcc  maps  to  a  SPARC  subcc  instruction,  and  updates  the  condition 
code  environment  to  cc_cmp. 

testbox:  The  testbox  instruction  corresponds  to  a  SPARC  subcc  instruction.  It 
compares  a  variable  residing  in  some  register  to  integer  256  to  decide  whether 
the  variable  is  a  pointer  or  not.  The  implicit  assumption  here  is  that  all 
pointers  have  address  values  greater  than  256.  This  instruction  updates  the 
condition  code  environment  to  cc_testbox. 

test  mem :  The  testmem  instruction  also  corresponds  to  a  SPARC  subcc  instruc¬ 
tion.  It  compares  the  alloc  pointer  to  limitptr  (the  limit  pointer  as  shown 
in  Figure  3.13)  to  decide  whether  there  are  free  memory  cells  or  not.  This 
instruction  updates  the  condition  code  environment  to  ccjtestmem. 
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Branch  instruction  if:  The  if  statement  corresponds  to  a  SPARC  branch  instruc¬ 
tion.  Its  type  checking  rules  check  the  argument  types  of  both  branches,  but 
does  not  refine  any  types  or  environments  in  either  branch  as  the  iffull,  ifboxed, 
ifboxedone ,  and  iftag  instructions  do. 

Branch  instruction  iffull:  The  iffull  statement  corresponds  to  the  SPARC  branch 
instruction  bcc.  Its  type  checking  rule  checks  the  argument  types  of  both 
branches,  and  checks  that  the  condition  code  environment  is  cc_testmem  and 
has  enough  free  memory  slots.  The  type  checker  then  remembers  this  fact  in 
the  heap  allocation  environment. 

Branch  instruction  ifboxed:  The  ifboxed  statement  corresponds  to  the  SPARC 
branch  instruction  bcc.  Its  type  checking  rule  checks  the  argument  types  of 
both  branches,  and  checks  that  the  condition  code  environment  is  cc_testbox 
and  with  valid  type.  The  type  checker  then  refines  the  types  of  the  variable 
being  tested  for  both  branches  depending  on  whether  it  is  boxed  or  not. 

For  example,  the  instruction  A  =  testbox(B)  sets  the  type  of  variable  A 
to  (0  fl  tI  ),  where  0  is  the  de  Bruijn  index  implementation  of  a  fresh  type 
variable,  is  the  type  of  value  B ,  and  j  is  the  shift  operator  in  the  explicit 
substitution  calculus  [Abadi  et  ah,  1990].  The  testbox  instruction  binds  a 
fresh  type  variable.  After  this  instruction,  the  new  environments,  including 
typing  environment  q 6,  heap  allocation  environment  and  condition  code 
environment  cc,  have  one  new  type  variable  and  its  kind.  The  condition  code 
environment  is  set  to  cc_testbox(0). 

In  the  type  checking  rule  of  instruction  ifboxed  (v)  then  (u1;  li)  else  (u2,  h),  we 
check  that  the  condition  code  environment  is  cc_testbox(r),  and  the  type  of  v 
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is  r  fl  sum(rr,Ts)  whose  first  component  r  matches  the  type  in  the  condition 

rti 

code  environment.  Then  we  refine  the  type  of  v  to  either  3.hastag(0  fl  r]  )  or 
Tr  and  bind  them  to  V\  or  v2,  respectively,  depending  on  whether  v  is  boxed 
or  not.  We  also  check  that  the  argument  types  match  for  branches  to  labels 
l\  and  /2.  We  use  type  variables  (de  Bruijn  indices  in  our  implementation) 
and  intersection  types  to  check  integrity  of  type  safety  when  the  testing  of 
boxedness  and  branching  on  boxedness  are  separated  instructions.  We  use  a 
similar  trick  to  type  check  pairs  of  testmem  and  iffull  instructions  and  pairs  of 
cmpcc  and  iftag  instructions.  Chen  and  Tarditi  [2005]  have  subsequently  used 
this  to  type  check  method  lookup  and  call  in  virtual  table  in  object-oriented 
languages  [Chen  and  Tarditi,  2005]. 

Branch  instruction  ifboxedone:  The  ifboxedone  instruction  is  a  special  case  of 
ifboxed.  Its  type  checking  rule  does  an  additional  check  that  the  type  rs  above 
is  not  a  union  type;  that  is,  there  is  only  one  boxed  case  in  the  sum  type.  In 
this  case,  the  compiler  optimizes  the  data  representation  of  the  sum  data  type 
to  save  the  tag  held  and  remove  one  layer  of  boxing  as  shown  in  Figure  3.9. 

Branch  instruction  iftag:  The  iftag  (V)  { v }  then  (ui,/i)  else  (v2,  C)  statement 
corresponds  to  a  SPARC  branch  instruction.  The  cmpcc  instruction  compares 
a  variable  to  the  tag  of  a  sum  data  type  (tagged  union)  and  sets  the  condi¬ 
tion  code  (environment)  to  cc_cmp.  The  type  checking  rule  of  iftag  checks  the 
argument  types  of  both  branches,  and  checks  that  the  condition  code  environ¬ 
ment  is  cc_cmp(rta5,  i)  and  matches  u’s  type  hastag(rta3,  tu).  Then  the  type  of 
v  is  refined  depending  on  whether  or  not  the  tag  matches,  and  the  new  type 
is  bound  to  v\  or  u2,  respectively. 
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The  sophisticated  rule  shown  in  Section  3.3  is  a  typical  instruction  type  checking 
rule  in  onr  actual  implementation.  For  the  sake  of  clarity,  we  present  a  simple  version 
of  the  actual  rules  implemented.  The  complete  instruction  typing  rules  are  listed  in 
Appendix  A. 2. 


3.10  Don’t  Trust  the  Linker! 

To  avoid  the  need  to  reason  about  possible  bugs  in  the  link-loader,  we  arrange  that 
each  compilation  unit  needs  no  link-editing,  and  links  to  others  using  closures,  in 
the  style  of  SML/NJ  [Blume  and  Appel,  1997,  §3].  We  must  avoid  the  need  for 
a  linker  to  do  relocation.  Our  safety  policy  says,  “a  program  is  safe  if,  no  matter 
where  we  load  it  in  memory,  it  will  never  access  an  illegal  address  or  execute  an 
illegal  instruction”  [Appel,  2001]. 

PCC  systems  are  most  useful  in  applications  where  untrusted  code  shares  the 
same  address  space  with  trusted  code;  in  such  situations,  position-independent  code 
is  desirable  because  it  makes  the  linker  flexible. 

Position-independent  code  must  use  relative  addresses  instead  of  absolute  ones. 
The  problem  arises  when  we  move  a  label  into  a  register  or  store  it  in  memory,  to 
make  a  function-pointer  or  a  closure.  The  value  of  the  label  depends  on  where  the 
code  is  loaded. 

We  adopt  the  solution  that  SML/NJ  uses,  but  we  show  how  to  type-check  it. 
Each  function  takes  a  base  parameter,  which  is  the  start  address  of  its  own  machine 
code  in  the  memory.  We  keep  the  base  address  of  the  current  function  in  a  register, 
and  calculate  the  addresses  of  labels  as  offsets  from  base.  When  a  function  /  is 
called,  the  address  /  is  passed  as  its  own  base  argument. 
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In  the  body  of  a  function  /,  moving  a  label  g  to  variable  v  is  implemented  as 
v  =  addradd(base,  g  —  /),  where  g  —  f  is  a  constant  computed  by  the  compiler. 
Instruction  addradd  is  translated  to  the  SPARC  add  instruction,  and  used  only  for 
address  arithmetic. 

LRT ;  p;  0  b  v\  :  addr (/) 

LRT;  p;  0  b  v2  :  diff (g,f) 
ft  =  (ft\v),v  :  addr(o) 

- - - v  N  1 - — -  TnsirArlrlrArlrl 

LRT  b  (p;  JY*]  0;  cc)  {v  =  addradd(i>i,  U2)}  (p;  ft]  cc) 

To  type-check  position-independent  code,  we  introduce  type  constructors  addr  and 
diff.  The  former  gives  a  type  to  a  label  and  the  latter  types  the  difference  between 
two  labels.  For  example,  in  the  above  example  v  =  addradd(base,  g  —  /),  variable 
base  has  type  addr(/);  the  compile-time  known  constant  g  —  /,  which  is  represented 
as  a  value  vdiff(g,  /),  has  type  diff(p,  /);  and  the  typing  rule  for  addradd  will  give 
type  addr(p)  to  v. 

When  a  function  /  is  called  in  a  compilation  unit  other  than  where  it  is  defined, 
its  label  is  (statically)  unknown  at  the  call  site.  Then  the  type  of  its  base  cannot 
be  addr.  We  use  existential  types  to  hide  the  base  type;  the  type  of  /  becomes 
3/?.codeptr[a  :  k] (rri,  [base  .]).  To  make  sure  that  /  itself  is  passed  to  its  base 
when  /  is  called,  we  make  /  have  type  3 0.(0  fl  codeptrfa  :  k]  (m,  [base  :  /?,...])). 

As  an  important  optimization,  when  a  function  is  called  only  by  direct  jumps 
from  known  locations,  it  does  not  need  its  own  base  argument — it  can  use  the  base 
of  one  of  its  known  callers.  This  avoids  addradd  instructions  in  local  loops  and 


branches. 
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Category 

Constructors  &  operators 

SPARC  instruction  constructors 

196 

SPARC  instruction  decoding  rules 

263 

Coercion  operators 

32 

Coercion  rules 

46 

Explicit  substitution  calculus 

59 

Environment  constructors 

63 

LTAL  type  operators 

21 

LTAL  instruction  operators 

43 

Type  refinement  rules 

82 

Kind  operators 

17 

Kind  checking  rules 

36 

Type  wcllformedness  rules 

41 

Local  environment  management 

86 

Static  arithmetic  calculations 

55 

Rules  for  parsing  LRT  maps 

16 

Structural  type  matching  heuristics 

38 

Branch  checking  rules 

17 

LTAL  instruction  constructors 

52 

Instruction  typing  rules 

53 

Total 

1,216 

Table  3.1:  LTAL  calculus  statistics. 


3.11  Measurements 

3.11.1  Size 

The  LTAL  calculus  is  a  large  engineering  artifact,  just  like  the  compiler  that  pro¬ 
duces  it  and  the  SPARC  machine  that  consumes  it.  It  comprises  (at  the  current 
state  of  implementation)  approximately  1200  operators  and  rules.  The  statistic 
data  are  summarized  in  Table  3.1.  The  first  column  gives  brief  description  of  var¬ 
ious  constructors  and  rules.  The  second  column  shows  the  number  of  constructors 


and  rules. 
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A  typical  large  rule,  such  as  the  one  shown  in  Section  3.3,  is  quantified  over  a 
dozen  variables  and  has  a  dozen  premises.  In  all,  the  current  LTAL  type  checker 
is  4,163  lines  of  (non-blank,  non-comment)  Prolog-like  source  code.  The  machine- 
checked  proof  of  the  soundness  of  all  the  LTAL  rules  (which  is  nearing  completion) 
is  over  143,400  lines  of  higher-order  logic  as  represented  in  the  Twelf  system.  The 
axioms  comprise  1,957  lines,  almost  all  of  which  is  the  specification  of  the  SPARC 
instruction  set  architecture. 

The  compiler  from  core  ML  to  LTAL  and  SPARC  machine  code  is  written  in  ML; 
its  size  (including  blank  lines  and  comments)  is  50k  lines  of  the  SML/NJ  (version 
110.35)  front  end  (unmodified);  1.8k  lines  of  code  copied  and  modified  from  the 
implementation  of  the  SML/NJ  interactive  top-level  loop;  2.7k  lines  to  translate 
FLINT  to  NFLINT;  7.8k  lines  to  translate  NFLINT  to  LTAL;  1.2k  lines  to  interface 
of  MLRISC;  and  approximately  50k  lines  of  the  MLRISC  system5  itself,  of  which 
400  lines  are  new  or  modified  to  support  our  more-general  annotation  interface. 

3.11.2  Performance 

We  compared  our  performance6  to  that  of  SML/NJ  (version  110.35)  on  two  small 
benchmarks:  Life  (adapted  from  the  Standard  ML  benchmark  suite)  and  RedBlaek, 
which  uses  balanced  trees  to  do  queries  on  integer  sets.  The  results  are  shown  in 
Table  3.2. 

Our  compile  time  is  not  competitive  (2.998  seconds  to  compile  Life  compared 
to  0.49  seconds  for  the  production  release  of  SML/NJ);  we  have  not  engineered  our 

5The  MLRISC  software  has  several  other  analyses,  optimizations,  and  target  machine  specifi¬ 
cations  that  we  did  not  use  and  that  we  don’t  count  here. 

6The  compile  and  run  time  is  measured  on  Sun  UltraSPARC  E250,  400  MHz.  The  safety 
checking  time  is  measured  on  2.2  GHz  Pentium  4. 
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Benchmark 

redblack 

life 

SML/NJ  Compile  time 

0.300 

0.490  s 

SML/NJ  Run  time 

0.013 

0.262 

FPCC  Compile  time 

0.955 

2.998 

FPCC  Run  time 

0.014 

0.407 

FPCC/SMLNJ  slowdown 

1.036 

1.555 

Safety  check  time  in 

SICStus 

0.183 

0.432 

Flit 

1.32 

2.19 

Twelf 

1018 

>3600 

Sparc  instrs. 

870 

1816 

LTAL  tokens 

34278 

57670 

Coercion  tokens 

17% 

23% 

Table  3.2:  FPCC/ML  compiler,  LTAL,  and  Flit  performance. 


compiler  algorithms  as  necessary  for  a  production  compiler.  Run  time  is  almost  as 
good  as  SML/NJ.  Currently  we  do  not  garbage  collect;  SML/NJ  spends  0.02%  of 
its  time  garbage-collecting  on  these  benchmarks.  SML/NJ’s  better  performance  is 
probably  because  it  has  more  sophisticated  liveness-based  closure  conversion  and  fills 
branch-delay  slots.  Other  than  that,  the  optimizations  performed  by  the  FPCC/ML 
compiler  are  just  about  as  sophisticated  and  comprehensive  as  those  of  SML/NJ. 

To  measure  safety  check  time,  we  translate  our  lemmas  into  Prolog  rules  and 
time  the  execution  in  SICStus  Prolog.  As  an  alternative,  we  have  built  a  minimal- 
size  interpreter,  Flit,  for  syntax-directed  lemmas;  it  is  much  simpler  than  Prolog 
because  it  doesn’t  require  backtracking  [Appel  et  ah,  2002;  Wu  et  ah,  2003].  Twelf 
also  builds  in  a  logic  programming  engine.  We  measured  the  safety  checking  time  in 
all  three  systems.  Twelf  is  not  designed  for  performance,  but  its  advanced  features 
make  it  a  convenient  tool  for  us  to  develop  machine-checkable  proofs  in  LF.  Flit  is 
about  five  times  slower  than  the  optimizing  SICStus  Prolog,  and  is  fast  enough  for 
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the  intended  application.  The  performance  of  pro  of- checking  in  Flit  will  be  further 
discussed  in  Chapter  5. 

Simple  encodings  should  be  able  to  represent  LTAL  in  a  few  bits  per  token,  so 
the  LTAL  expression  should  not  be  significantly  bigger  than  the  machine-language 
program.  At  present,  however,  we  represent  LTAL  expressions  as  LF  terms,  and 
encode  them  in  the  form  of  directed  acyclic  graphs  (DAGs)  [Appel  et  al.,  2002, 
2003].  Eliminating  the  LTAL  coercions — thus  requiring  some  backtracking  in  the 
type  checker — could  save  about  20%  in  LTAL  size.  The  builders  of  SpecialJ  [Colby 
et  ah,  2000]  and  TALx86  [Morrisett  et  ah,  1999a]  have  devoted  substantial  effort 
to  reducing  proof  size — not  just  removing  coercions  but  getting  the  checker  to  re¬ 
construct  other  data  as  well.  Clearly,  there  is  some  engineering  to  be  done  in  this 
respect,  although  we  would  not  want  to  complicate  any  part  of  the  checker  that  is 
in  the  trusted  base. 


3.12  Related  Work 

TAL  [Morrisett  et  ah,  1998,  1999b]  demonstrated  the  idea  of  typed  assembly  lan¬ 
guage,  but  was  too  limited  for  practical  programming  languages.  Extensions  of 
this  work  supported  stack  allocation  [Morrisett  et  ah,  2002]  and  implemented  a 
more  realistic  calculus  (TALx86)  [Morrisett  et  ah,  1999a]  for  compiling  a  safe  C-like 
language  to  Intel  1A32  assembly  language.  DTAL  [Xi  and  Harper,  2001]  added  a  re¬ 
stricted  form  of  dependent  types  to  TAL  to  support  array  bound  check  elimination 
and  datatype  tag  discrimination.  These  implementations  have  soundness  proved  by 
hand  about  abstractions  of  subsets  of  the  systems  that  are  actually  implemented; 
the  proofs  cannot  be  machine-checked.  These  TALs  each  have  a  macroinstruction 
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“malloc”  for  heap  allocation,  and  TALx86  has  another  macro  “btagi”  which  tests 
tags  and  branches. 

Hamid  et  ah  [2002]  proposed  a  syntactic  approach  to  build  machine-checkable 
foundational  proofs.  They  designed  Featherweight  Typed  Assembly  Language  (FTAL), 
mapped  each  valid  machine  state  to  a  well-typed  FTAL  program,  and  related  tran¬ 
sition  of  machine  states  to  evaluation  of  FTAL  programs  by  a  machine-checked 
syntactic  metatheorem.  Crary  [2003]  has  built  a  more  substantial  TALT,  with  a 
machine-checked  syntactic  metatheorem  proving  progress  and  preservation;  he  uses 
simulation  to  relate  his  typed  calculus  to  the  “bare  machine”  untyped  step  relation. 


Chapter  4 


Machine-Checkable  Soundness 
Proofs  for  LTAL 


In  this  section,  we  give  an  overview  of  the  semantic  techniques  we  used  to  build  a 
machine-checkable  soundness  proof  for  LTAL.  We  give  semantic  models  to  types, 
instructions,  and  typing  judgements,  and  prove  type  checking  rules  as  lemmas  with 
respect  to  machine  specification  and  logic  axioms.  The  semantic  models  allow  a 
type  checking  derivation  to  be  interpreted  as  a  machine-checkable  safety  proof  at 
the  machine  level. 


4.1  Overview 

In  both  Necula’s  PCC  [Necula  and  Lee,  1996;  Necula,  1997]  and  Morrisett’s  TAL 
[Morrisett  et  al.,  1998,  1999b]  systems,  type  checking  rules  are  trusted  as  axioms. 
In  other  words,  the  type  systems  used  in  their  systems  do  not  have  a  (machine- 
checkable)  soundness  proof.  For  example,  in  the  TAL  system,  there  are  13  kinds 
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of  typing  judgements  and  many  typing  rules  are  with  similar  complexity  as  the 
following  one: 


fr;A;r  hTALu:V[].r  A  hTAL  £  <  H 
'h;  A;  T  h tal  jmp  v 


(s-jrnp) 


This  rule  is  intuitively  “correct”  based  on  the  semantics  of  the  jump  instruc¬ 
tion,  and  there  is  a  paper-and-pencil  proof  of  soundness  [Morrisett  et  al.,  1999b]. 
However,  the  TAL  described  in  their  published  paper  is  not  the  TALx86  [Morrisett 
et  ah,  1999a]  that  they  actually  implemented.  Any  misunderstanding  of  the  seman¬ 
tics  could  lead  to  errors  in  the  type  system.  League  et  al.  [2003]  found  an  unsound 
proof  rule  in  the  SpecialJ  [Colby  et  ah,  2000]  type  system.  In  the  process  of  refining 
our  own  TAL  [Chen  et  ah,  2003],  we  routinely  find  and  fix  bugs  that  can  lead  to 
unsoundness. 

Since  errors  in  the  Trusted  Computing  Base  (TCB)  can  be  exploited  by  malicious 
code,  it  is  useful  to  minimize  the  TCB.  A  foundational  approach  is  to  move  the 
entire  type  system  out  of  TCB  by  proving  its  soundness  and  by  verifying  that  type¬ 
checking  implies  the  safety  theorem.  We  give  models  to  types  and  judgements  so 
that  both  typing  rules  and  the  type-safety  theorem  can  be  proved  and  mechanically 
verified  in  a  theorem-proving  system  [Appel,  2001;  Wu  et  ah,  2003;  Tan  et  ah,  2004]. 

The  rest  of  this  chapter  is  organized  as  follows.  We  first  introduce  the  logic 
and  logical  framework  we  used  to  build  machine  checkable  proofs  in  Section  4.2. 
We  then  describe  the  machine  architecture  specification  in  Section  4.3.  The  safety 
specification  is  presented  in  Section  4.4.  After  that,  we  build  semantic  models  for 
types  in  Section  4.5,  and  prove  the  soundness  of  LTAL  in  Section  4.6.  Finally,  we 
present  our  implementation  in  Section  4.7  and  discuss  related  work  in  Section  4.8. 
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4.2  Logic  and  Logical  Framework 

In  order  to  build  machine  checkable  proofs,  one  must  choose  a  formal  logic  and  an 
implementation  of  the  logic  to  manipulate  proofs  written  in  the  logic.  We  choose 
higher-order  logic  as  the  object  logic  since  it  is  expressive  and  permits  concise  proofs. 
The  LF  logical  framework  [Harper  et  al.,  1993]  is  chosen  as  the  meta-logic  to  encode 
higher-order  logic. 

LF  is  a  dependent  type  theory  based  on  A-calculus  with  type  families  and  /in¬ 
equality.  It  has  three  levels  of  terms:  objects,  types,  and  kinds.  Types  classify 
objects  and  kinds  classify  type  families.  LF  is  a  framework  for  defining  logics  [Harper 
et  al.,  1993].  The  framework  is  general  enough  to  represent  logics  of  interest;  we 
use  it  to  encode  higher-order  logic  [Appel,  2001]. 

tp  :  type . 
tm  :  tp  ->  type, 
form  :  tp. 
num  :  tp . 

arrow:  tp  ->  tp  ->  tp.  %infix  right  14  arrow, 
pair  :  tp  ->  tp  ->  tp. 
pf  :  tm  form  ->  type. 

In  LF,  “type"  is  a  keyword  for  declaring  an  LF  type  (meta-logical  type),  and 
is  the  meta-logical  function  type.  In  the  above  LF  code,  tp  is  declared  as  a 
type  in  the  met  a  logic  LF,  and  it  classifies  object  logic  types.  Our  object  logic  has 
primitive  types  form  and  num  for  formulas  and  numbers,  respectively.  The  construc¬ 
tor  tm  converts  a  term  of  object  logic  type  T  {T  is  of  meta-logical  type  tp)  into  a 
term  of  meta-logical  type  tm  T .  For  any  formula  A  of  meta-logical  type  tm  form, 
proofs  of  A  are  encoded  as  terms  of  meta-logical  type  pf  (A) .  The  constructors 
arrow  and  pair  are  used  to  build  function  types  and  tuples,  respectively,  in  the 
object  logic.  The  constructor  arrow  is  declared  infix  to  make  it  more  readable. 
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Then  we  introduce  constructors  and  definitions  in  our  object  logic,  and  prove 
lemmas  based  on  them. 

lam  :  (tm  T1  ->  tm  T2)  ->  tm  (T1  arrow  T2) . 

@  :  tm  (T1  arrow  T2)  ->  tm  T1  ->  tm  T2. 

°/0inf ix  left  20  0. 

imp  :  tm  form  ->  tm  form  ->  tm  form. 

"/.infix  right  10  imp. 
forall  :  (tm  T  ->  tm  form)  ->  tm  form. 

imp_i:  (pf  A  ->  pf  B)  ->  pf  (A  imp  B) . 
imp_e:  pf  (A  imp  B)  ->  pf  A  ->  pf  B. 

forall_i:  ({x:tm  T}  pf  (A  x))  ->  pf  (forall  A). 
forall_e:  pf  (forall  A)  ->  fx:tm  T}  pf  (A  x) . 

and  :  tm  form  ->  tm  form  ->  tm  form  = 

[a]  [b]  forall  [c]  (a  imp  b  imp  c)  imp  c. 

"/.infix  right  12  and. 

and_i :  pf  A  ->  pf  B  ->  pf  (A  and  B)  = 

[pi:  pf  A] 

[p2 :  pf  B] 

forall_i  [c :  tm  form] 

imp_i  [p3]  imp_e  (imp_e  p3  pi)  p2. 

and_el:  pf  (A  and  B)  ->  pf  A  = 

[pi:  pf  (A  and  B)] 
imp_e  (forall_e  pi  A) 

(imp_i  [p2:  pf  A]  imp_i  [p3:  pf  B]  p2) . 

imp_trans  :  pf  (A  imp  B)  ->  pf  (B  imp  C)  ->  pf  (A  imp  C)  = 

[pi]  [p2]  imp_i  [p3]  imp_e  p2  (imp_e  pi  p3) . 

imp_ref 1  :  pf  (A  imp  A)  =  imp_i  [pi]  pi . 

imp_true  :  pf  B  ->  pf  (A  imp  B)  =  [pi]  imp_i  [p2]  pi. 

The  lam,  imp,  and  forall  are  constructors  for  A-abstraction,  function  appli¬ 
cation,  logical  implication,  and  universal  quantification  in  onr  object  logic.  Next, 
we  define  the  introduction  and  elimination  rules  (e.g.  imp_i  and  imp_e)  for  these 
constructors.  Finally,  we  can  introduce  definitions  and  lemmas  based  on  construe- 
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tors  previously  defined.  The  definition  and  and  its  introduction  and  elimination 
rules  (and_i  and  and_el)  are  type  checked  for  validity.  The  lemmas  imp_trans, 
imp_ref  1,  and  imp_true  are  proved  and  checked. 

The  proof  checking  in  LF  is  based  on  the  formulae- as-types  principle  (as  know 
as  Curry- Howard  correspondence)  [Howard,  1980].  A  formula  or  theorem  is  encoded 
as  a  type  in  the  LF  type  theory,  and  a  proof  of  the  theorem  is  an  LF  term  of  the 
LF  types  that  encodes  the  theorem.  Thus,  the  proof  checking  in  the  object  logic  is 
reduced  to  the  LF  type  checking. 

Twelf  [Pfenning  and  Schiirmann,  1999,  2002]  is  an  implementation  of  LF.  We 
use  Twelf  for  our  development  of  machine  checkable  proofs.  Twelf  has  many  useful 
features,  such  as  type  inference  and  mode  analysis,  which  make  it  a  convenient  tool 
for  us  to  develop  and  manipulate  machine-checkable  proofs  in  higher- order  logic 
(encoded  in  the  meta-logic  LF). 

With  many  advanced  features,  Twelf  is  a  very  good  choice  for  our  development. 
Twelf  is  not,  however,  trustworthy  or  minimal  in  terms  of  system  size  and  features. 
Because  we  want  to  build  high- assurance  system  and  don’t  want  to  include  a  large 
proof  checker  in  the  TCB,  we  implement  a  simple  yet  efficient  LF  proof  checker  in 
Flit,  which  is  presented  in  Chapter  5.  Flit  also  implements  a  simple  logic  program¬ 
ming  engine  for  efficient  proof  checking  [Wu  et  ah,  2003]. 


4.3  Machine  Instruction  Specification 


Our  machine  model  consists  of  a  set  of  formulas  in  higher-order  logic  that  specify 
the  decoding  and  operational  semantics  of  instructions.  Our  safety  policy  specifies 
which  addresses  may  be  loaded  and  stored  by  the  program  (memory  safety)  and 
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defines  what  the  code  safety  means.  Our  machine  model  and  safety  policy  are 
trusted  and  are  small  enough  to  be  “verifiable  by  inspection.” 

In  our  model,  a  machine  state  (r,  m)  consists  of  a  register  bank  r  and  a  memory 
m,  which  are  modeled  as  functions  from  numbers  (register  numbers  and  addresses) 
to  numbers  (contents).  A  machine  instruction  is  modeled  by  a  relation  between 
two  machine  states  (r,  m)  and  (r',  m!)  before  and  after  execution  of  the  instruction 
[Michael  and  Appel,  2000].  For  example,  the  add  instruction  rt  <—  Tj+ry .  is  modeled 
as  the  following  relation:1 


add  (i,j,k)  = 

Xr,m,r' ,m' .  r'{i)  =  r(j)  +  r(k)  A  (Vx  ^  i.  r'{x)  =  r(x))  A  m!  =  m 


Since  we  want  to  prove  safety  of  machine  code,  which  is  just  a  sequence  of 
integers  (representing  machine  instructions),  we  must  model  the  decode  relation  to 
connect  instruction  words  to  their  actual  meanings.  The  decode  rule  in  Section  3.3.1 
illustrates  the  idea,  but  we  need  to  model  the  decode  relation  for  every  instruction 
of  the  machine.  The  decode  relation  is  specified  as  follows:  Some  number  w  decodes 
to  an  instruction  instr  if  [Michael  and  Appel,  2000;  Appel,  2001] 


1Our  step  relation  first  increments  the  program  counter  pc,  then  executes  an  instruction.  Thus, 
the  semantics  of  add  instruction  does  not  include  the  semantics  of  incrementing  the  pc. 
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decode(w,  instr )  = 

(3  i,j,k. 

0  <  i  <  25  A  0  <  j  <  25  A  0  <  k  <  25  A 

w  =  3  •  226  +  i  ■  221  +  j  ■  216  +  k  •  2°  A 

instr  =  add (i,j,  k)) 

V  (3 i,j,c. 

0<i<25  A  0  <  j  <  25  A  0  <  c  <  216  A 

w  =  12  •  226  +  i  •  221  +  j  •  216  +  c  •  2°  A 

instr  =  load(i,  j,  c)) 

V  ... 

where  the  ellipsis  denotes  many  other  instructions  of  the  machine. 

The  machine  operational  semantics  is  modeled  by  a  step  relation  i— >  that  steps 
from  one  state  (r,m)  to  another  state  ( r',  m ')  [Michael  and  Appel,  2000],  where 
the  state  (V,  m')  is  the  result  of  first  decoding  the  current  machine  instruction, 
incrementing  the  program  counter  and  then  executing  the  machine  instruction. 

(r,m)  i— >•  (r' ,m')  =  3instr.  decode(r (pc),  instr) 

A  upd(r,  pc,  r(pc)  +  4,  r") 

A  instr  (r"  ,m,r'  ,m') 

where  upd  predicate  increments  the  program  counter  pc  (assuming  the  instruction 
size  is  4),  and  the  result  register  bank  state  is  r" . 

An  important  property  of  our  step  relation  is  that  it  is  deliberately  partial:  It 
omits  any  step  that  would  be  illegal  under  the  safety  policy.  For  example,  the  load 
instruction  is  specified  by 
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load(i,j,  c)  =  A r,m,r' ,m! .  r'(i)  =  m(r(j)  +  c)  A  (VT  ^  d.  r'(x)  =  r(x )) 

A  m'  =  m  A  readable  (r(j)  +  c). 

Suppose  in  some  state  (r,  m)  the  program  counter  points  to  a  load  instruction 
that  would,  if  executed,  load  from  an  address  that  is  unreadable  according  to  the 
safety  policy.  Then,  since  our  load  instruction  requires  that  the  address  must  be 
readable,  there  will  not  exist  ( r',m ')  such  that  (r,  m)  t— >  ( r',  m '). 

4.4  Safety  Specification 

As  stated  in  the  previous  section,  onr  step  relation  is  deliberately  partial;  some 
states,  in  which  the  program  counter  r(pc)  points  to  an  illegal  instruction  or  r(pc) 
points  to  a  legal  machine  instruction  that  violates  onr  safety  policy,  have  no  suc¬ 
cessor  states.  This  mixing  of  machine  semantics  and  safety  policy  is  to  follow  the 
standard  practice  in  type  theory  [Wright  and  Fcllcisen,  1994]  so  that  we  can  get  a 
clean  and  uniform  definition  of  safety  property. 

Using  the  partial  step  relation,  we  can  define  a  safe  machine  state  as  a  state  that 
cannot  lead  to  a  stuck  state. 

safe-state(r,  m)  = 

Vr',m'.  (r,  m)  1— >•*  (r',  m!)  3r",  m".  (r',  ml)  >  (r",  m") 

where  1— A  denotes  zero  or  more  steps. 

To  show  safe-state(r,  m),  it  suffices  to  prove  that  the  state  is  “safe  for  n  steps,” 


for  any  natural  number  n. 
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safe-n-state(n,  r,  m)  = 

\/r',m'.  Vj  <  n.  ( r,m )  e- V  {r' ,m')  =>  3r",m".  (Ppm')  t— >  (r",m") 
where  i— V  denotes  j  steps  being  taken. 

A  machine-language  program  is  just  a  sequence  of  integers  (each  representing  a 
machine  instruction);  we  state  that  a  program  p  is  loaded  at  a  location  l  in  memory 
m  if 


loaded(p,  m,  l )  =  Vi  G  dom(p).  m(i  +  l)  —  p(i) 

Finally  we  define  program  safety  as  follows.  Assume  that  programs  are  written 
in  position-independent  code.  A  program  is  safe  if,  no  matter  where  we  load  it  in 
memory  and  the  machine  state  meets  the  initial  precondition  do,  we  get  a  safe  state 
[Appel,  2001]: 


safe(p)  = 

Vr,  m,  start.  loaded(p,  m,  start)  A  r (pc)  =  start  A  (m,r)  :  (f>0 
=>  safe-state(r,  m) 

The  initial  precondition  0o  specifies,  among  other  things,  the  initial  state  of  the 
register  bank  and  memory.  It  also  states  that  the  return  address  is  a  label  to  which 
it  is  safe  to  jump.  Our  current  initial  precondition  is  simple  enough  such  that  it 
can  be  described  in  onr  logic  without  type  constructors  since  LTAL  or  other  types 
are  not  in  the  TCB.  We  are  currently  investigating  how  to  augment  the  TCB  with 
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some  type  constructors  so  that  more  sophisticated  initial  precondition  and  runtime 
interface  can  be  specified. 

The  safety  theorem  is  parametrized  by  the  application  program.  We  only  need 
two  operators  to  specify  a  machine-language  program,  that  is,  cons  and  nil  for  se¬ 
quences  of  integers  representing  machine  instructions.  Let  ;  be  the  “cons”  operator. 
For  some  machine-language  program2 

2551193600  ; 

2181292040  ; 

2214748172  ; 

2416058369  ; 

2450522113  ; 

2176860160  ; 

16777216  ; 

nil 


the  safety  theorem  is 

safe  (  2551193600  ;  . . .  ;  16777216  ;  nil  ) 


Suppose  PROOF  is  a  proof  of  the  above  theorem.  To  check  the  validity  of  the 
proof,  we  type  check  the  following  LF  term: 

saf e_thm:  pf  (safe  (2551193600  ;  ...  ;  16777216  ;  nil))  =  PROOF. 


Appel  et  al.  [2003]  measured  the  size  of  safety  specification  in  our  system.  The 
result  is  shown  in  Table  4.1.  In  our  safety  specification,  there  are  1,206  definitions 
encoded  in  1,865  lines  of  code  in  Twelf.  The  definitions  used  directly  or  indirectly 
to  specify  the  safety  theorem  need  to  be  trusted,  and  thus  are  part  of  the  safety 
specification.  Therefore,  all  definitions  and  constructors  up  to  the  definition  of  safe 

are  part  of  trusted  code  base.  On  the  other  hand,  definitions  specified  after  the 

2This  is  the  SPARC  program  compiled  from  the  ML  function  fun  f  (x)=x+l  by  our  FPCC/ML 
compiler. 
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Safety  Specification 

Lines 

Definitions 

Logic 

135 

61 

Arithmetic 

160 

94 

Machine  Syntax 

460 

334 

Machine  Semantics 

1,005 

692 

Safety  Predicate 

105 

25 

Total 

1,865 

1,206 

Table  4.1:  Safety  specification. 


definition  of  safe  and  used  in  the  proof  of  the  safety  theorem  do  not  need  to  be 
trusted  since  they  are  defined  and  checked  for  validity  before  they  are  used  in  other 
definitions  and  proofs. 


4.5  Semantic  Models  of  Types 

In  this  section,  we  give  a  brief  description  of  the  semantic  models  of  types  [Appel 
and  Fclty,  2000;  Appel  and  McAllester,  2001]. 

Appel  and  Fclty  [2000]  build  set-theoretic  models  for  types.  A  state  is  a  pair 
(a,m),  where  m  is  a  memory  (including  the  register  bank)  and  a  is  the  set  of 
allocated  addresses  of  dynamic  memory  allocation.  A  value  is  a  pair  (s,  x)  of  a  state 
s  and  an  integer  x  (typically  representing  an  address  or  root  pointer).  This  model 
can  handle  records,  addresses  arithmetic,  function  pointers,  intersection  and  union 
types,  covariant  recursive  types,  etc.,  but  cannot  handle  contravariant  recursive 
types. 

Appel  and  McAllester  [2001]  invented  the  indexed  model  of  types  that  can  de¬ 
scribe  contravariant  recursive  types.  In  the  indexed  model,  a  type  is  not  a  set  of 
values;  instead,  it  is  a  set  of  pairs  (k,v),  where  k  is  the  approximation  index  (non- 
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int 

int=(n) 
box  (r) 

field (i,  r) 

codeptr(0) 


=  {(k,  m,  x)  |  true} 

=  {{k,  m ,  x)  |  x  —  n} 

=  {{k,  m,  x)  |  x  G  dom(m)  A  readable(a:) 

A  {k  —  1,  m,  m(x))  G  r} 

=  {(k,  m ,  x)  |  (x  +  i)  G  dom(m)  A  readable(a;  +  i) 

A  (k  —  1,  m,  m(x  +  i ))  G  r} 

=  {(k,m,x)  j  Vj, r.  j  <  k  A  r(pc)  =  x  A  (■ m,r )  \j  4> 

safe-n-state( j,  r,  m) } 


Figure  4.1:  The  indexed  model  of  types. 


negative  integer)  and  v  is  a  value.  Intuitively,  a  pair  ( k ,  v)  G  r  means  the  value 
v  has  type  r  within  k  steps  of  computation;  that  is,  it  k- approximately  belongs  to 
type  t. 

Types  are  defined  such  that  they  are  closed  under  approximation;  that  is,  if 
(k,v)  G  r,  then  (j,v)  G  r  for  any  j  <  k.  We  use  v  r  as  a  syntactic  sugar  for 
(k,v)  G  r.  We  write  v  :  r  to  mean  v  r  is  true  for  any  k.  A  value  v  is  a  tuple 
(a,  m,  x)  of  the  set  of  allocated  addresses,  memory  (including  the  register  bank),  and 
an  integer  (typically  denoting  the  root  address).  For  the  sake  of  simplicity,  some¬ 
times  we  omit  the  allocated-address  set,  and  use  dom(m)  instead  when  necessary. 
The  indexed  model  of  some  types  is  shown  in  Figure  4.1. 

Any  value  is  of  type  int  since  any  memory  content  can  be  viewed  as  an  integer 
or  binary  number.  The  type  int=(n)  specifies  that  the  integer  is  exactly  n.  The 
type  box  (r)  states  that  the  root  pointer  is  a  readable  address  whose  content  is  of 
type  t  under  approximation  k  —  1.  This  is  because  it  takes  one  computation  step 
(a  memory  load  instruction)  to  dereference  a  boxed  value.  The  meaning  of  field 
type  is  similar  except  that  there  is  an  offset.  The  codeptr  type  states  that  the  root 
address  is  a  label  to  which  it  is  safe  to  jump.  Specifically,  it  says  that  if  the  current 
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{n  :  r} 

4>i  n  4>2 

(f)[n  i — *  r] 


=  {(k,m,x) 
=  {(k,m,x) 
=  {(k,m,x) 
=  {(k,m,x) 
=  {(k,m,x) 


true} 

false} 

(k,  m,  xn)  G  r} 

( k,m,x )  G  A  ( k,m,x )  G  02} 

{k,m,  x[n  i— >  y])  G  (f)  A  (k,  m,xn)  G  r} 


Figure  4.2:  The  indexed  model  of  environments  (vector  values). 


machine  state  satisfies  precondition  0  with  any  index  j  <  k,  it  is  safe  to  run  j  steps 
starting  from  the  root  address  x. 

Moreover,  we  often  need  to  judge  not  only  scalar  values  such  as  a  singleton 
integer  but  also  vector  values  such  as  the  register  bank  type,  typing  environments, 
and  code  pointer  preconditions  (a  list  of  arguments  and  their  types).  Vector  values 
are  modeled  in  a  similar  way  except  that  the  root  pointer  is  a  vector,  a  function  from 
numbers  to  values.  For  example,  (m,  r)  :  0  means  that  the  register  bank  satisfies 
0.  Another  use  of  vector  types  is  the  label  environment  T,  which  summarizes  the 
preconditions  of  all  labels.  In  this  case,  x  is  the  identity  vector  id  which  maps 
label  l  to  itself.  Thus,  (m,  id)  :  {/  :  codeptr(0)}  means  that  label  /  itself  has  type 
codeptr(0). 

The  indexed  model  of  vector  values  is  shown  in  Figure  4.2.  Any  value  belongs 
to  T0,  and  no  value  belongs  to  T^.  The  singleton  environment  only  cares  about 
the  nth  slot  of  the  vector  and  states  that  its  content  has  type  r.  The  intersection 
of  two  environments  0i  fl  02  states  that  the  value  satisfies  both  environments.  The 
extension  of  environment  with  a  new  binding  is  represented  as0[n^r]. 

With  the  semantic  model  of  types  and  environments,  the  typing  rules  can  be 
proved  as  lemmas  [Appel  and  Felty,  2000;  Appel  and  McAllester,  2001].  For  exam¬ 
ple,  the  following  codeptr  elimination  rule  is  proved  as  a  lemma. 
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(m,x)  :fc+i  codeptr(^)  r(pc)  =  x 
safe-n-state(/e,  r,  m) 


(m,  r )  :k  (f) 


CodePtr-E 


The  rule  means  that  if  (1)  value  (■ m,x )  is  of  type  codeptr(^)  to  approximation 
k  + 1,  (2)  the  current  program  counter  is  x,  and  (3)  the  current  memory  and  register 
bank  (m,  r)  meets  0  to  approximation  k ,  then  it  is  safe  to  execute  k  steps  under 
the  current  machine  state  (m,r). 


4.6  Safety  Proof 

Figure  4.3  shows  a  program,  in  the  LTAL  and  SPARC  assembly  language,  compiled 
by  the  FPCC/ML  compiler.  The  LTAL  program  has  two  basic  blocks,  each  of  which 
is  annotated  by  a  precondition.  To  make  it  simple  and  readable,  we  have  omitted 
some  type  annotations,  such  as  coercions,  inside  the  basic  blocks.  We  have  also 
omitted  the  map  from  variables  to  registers. 

Type  annotations  are  generated  by  a  type-preserving  compiler  from  source  lan¬ 
guage  types.  They  serve  as  a  specification  (types  as  specifications).  On  the  other 
hand,  these  type  annotations  are  not  verified  yet;  they  are  the  invariants  that  the 
compiler  believes.  They  need  to  be  verified  through  a  sound  type  system. 

In  the  LTAL  type  system,  we  have  typing  judgements  for  programs,  basic  blocks, 
individual  instructions,  and  so  on.  In  order  to  prove  type  checking  rules  as  lemmas, 
we  must  define  the  meaning  of  typing  judgements.  Informally,  we  define  the  follow¬ 
ing  models.  The  model  of  the  program  typing  judgement  is  that  all  labels  are  safe 
to  execute  with  respect  to  their  preconditions  (although  it  suffices  to  ensure  that 
the  start  label  is  safe).  The  model  of  the  basic  block  typing  judgement  is  that  the 


CHAPTER  4.  MACHINE-CHECKABLE  SOUNDNESS  PROOFS 


85 


LTAL 

SPARC 

O 

-e- 

o 

10: 

V\0 

=  0 

mov 

0,*/.o4 

Vu 

=  addradd(v4,  8) 

add 

•/.o7,8,y.gl 

V19 

=  lbladd(v2,  h  —  Iq) 

add 

*/.ol ,  11-10,  */.g2 

Vl2 

=  v19 

v8  ~- 

=  vw 

V7  -- 

=  Vu 

V(i  -- 

=  V\0 

V5  ~~ 

=  V3 

calln(li) 

h  ■  0i 

11: 

v9  ~- 

=  v5  +  l 

add 

70oO ,  1 ,  %oO 

Vl3 

=  open(v7 ) 

Vu 

=  v13 

mov 

7.gl,0/oOl 

Vl5 

=  V13 

vw 

=  V14 

Vn 

=  Ve 

Vl8 

=  v9 

call(v  i5) 

JmP 

[7ogl+7ogO] 

nop 

Figure  4.3:  An  example  LTAL  program.  (This  program  is  compiled  by  the 
FPCC/ML  compiler  from  ML  function  “fun  f(x)=x+l”.) 


basic  block  in  consideration  is  safe  for  at  least  k  +  1  steps  assuming  all  the  other 
basic  blocks  are  safe  for  k  steps.  Let’s  define 


instr  (i)  =  {( k,m,x )  |  decode(m(a;),  *)} 

A  =  {0  :  instr  (i0)}  n  {4  :  instr  (ii)}  fl .. . 

r  =  {/o ;  0o}  n  {R  :  4>i}  n . . . 

A  C  r  =  VA :,m.  (m,  id)  A  =>■  (m,  id)  V 


The  instr  ( i )  is  an  indexed  type  that  relates  a  memory  content  and  the  instruc¬ 
tion  it  represents.  The  A  is  the  indexed  type  representation  of  machine-language 
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loaded  (p,  m,  0) 
(m,  id)  :  A 


(7) 


A  c  r 


(8) 


T  C  {0  :  codeptr(0o)} 


(m,  id)  :  {0  :  codeptr(0o)} 
(m,  0)  :  codeptr(0o) 

Wk.  (m,  0)  ifc  codeptr(0o) 

(' m ,  0)  :fc+1  codeptr(0o) 


(6) 


(5) 

(4) 

(3a) 


(m,  r) 


Vfc.  (m,r)  :fc 


(36) 


r(pc)  =  0 


Vfc.  safe-n-state(fc,  r,  m) 
safe-state  (r,  m) 


(2) 


(1) 


Figure  4.4:  Outline  of  safety  proof. 


program  (a  sequence  of  integers  encoding  machine  instructions).  The  T  is  the  label 
environment  (precondition)  for  each  label  (basic  block)  in  the  code.  The  subtyping 
relation  A  C  T  states  that  code  A  respects  invariant  T  under  any  approximation  k, 
which  is  exactly  the  meaning  of  the  program  typing  judgement.  Note  that  id  is  the 
identity  vector  which  maps  root  addresses  (labels  in  this  case)  to  themselves. 

The  proof  outline  is  shown  in  Figure  4.4.  This  is  a  proof  of  safe(p)  according 
to  its  definition.  The  assumptions  are  loaded (p,  m,  0),  r(pc)  =  0,  and  (■ m,r )  :  (ft0. 
For  the  sake  of  simplicity,  we  assume  the  start  address  is  0  here.  For  Step  (1), 
to  prove  (r,  m)  is  safe,  it  suffices  to  prove  that  (r,  m)  is  safe  for  an  arbitrary  k 
steps.  Step  (2)  is  justified  by  rule  CodePtr-E  in  Section  4.5.  Step  (3a)  is  by 
universal  instantiation.  Step  (3b)  is  by  definition.  Step  (4)  is  the  unfolding  of  the 
syntactic  sugar  of  (m,  0)  :  codeptr(0o).  Step  (5)  is  by  the  definition  of  the  singleton 
environment.  Step  (6)  is  by  the  transitivity  of  subtyping.  Step  (7)  can  be  easily 
proved  by  unfolding  definitions.  Step  (8)  is  by  type  checking  the  corresponding 
LTAL  and  SPARC  program.  The  model  of  program  typing  judgement  should  be 
strong  enough  to  prove  (8).  Please  see  Chapter  3  and  Section  5.3.4  for  the  LTAL 
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type  checking  rules  that  establish  this  subtyping  relation.  Tan  [2005]  and  Tan  et  ah 
[2004]  explain  in  more  detail  the  semantic  models  that  construct  the  proof  of  this 
subtyping  relation. 

We  briefly  explain  the  semantic  techniques  we  used  to  establish  the  subtyping 
relation  in  Step  (8)  above.  The  subtyping  relation  A  C  T  can  be  proved  by  induction 
over  the  number  of  execution  steps  that  are  safe  starting  from  labels  in  T.  The  base 
case  is  trivial  because  it  is  safety  to  run  zero  step  from  any  label.  In  the  inductive 
case,  for  the  current  label  l  we  prove  that  it  is  safe  to  execute  at  least  k  +  1  steps 
assuming  that  it  is  safety  to  execute  at  least  k  steps  starting  from  any  label.  This  is 
established  by  checking  individual  instructions  in  the  current  block.  Suppose  there 
is  at  least  one  real  (not  virtual)  instruction  in  the  current  block.  Then  we  can  prove 
that  it  is  safe  to  run  at  least  one  step  starting  from  the  current  label  by  checking 
individual  instructions  in  the  current  block.  Furthermore,  the  last  instruction  in 
the  current  block  must  be  a  branch,  jump,  or  return  instruction  that  transfers  the 
control  to  some  label  l'.  By  induction,  we  know  that  it  is  safe  to  execute  at  least 
k  steps  starting  from  label  V .  Therefore,  we  conclude  that  it  is  safe  to  execute  at 
least  k  +  1  steps  starting  from  the  current  label  /.  We  check  this  for  each  label,  and 
thus  establish  the  subtyping  relation  as  required. 

Take  the  LTAL  program  in  Figure  4.3  for  example,  there  are  two  labels,  /0  and 
li,  and  two  basic  blocks.  The  block  labeled  Iq  has  three  real  instructions  and  six 
virtual  instructions  (coercions).  The  block  li  has  four  real  instructions,  including 
the  nop  instruction  at  the  end,  and  five  virtual  instructions.  For  the  base  case  k  —  0, 
it  is  trivially  true  because  any  program  is  safe  to  run  0  step.  For  the  inductive  case, 
we  assume  that  the  labels  /0  and  l\  are  both  safe  for  executing  k  steps  starting  from 
them,  respectively.  By  checking  the  real  instructions  in  the  body  of  basic  block  Iq, 
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we  conclude  that  it  is  safe  to  execute  at  least  3  steps  starting  from  l0.  Since  the 
last  instruction  is  fall-through  to  block  R,  which  is  safe  for  k  steps,  we  conclude 
that  it  is  safe  to  execute  at  least  k  +  1  steps  starting  from  R.  Similarly,  we  can 
conclude  that  it  is  safe  to  execute  at  least  k  +  1  steps  starting  from  label  R.  What 
is  worth  mentioning  is  that  the  last  instruction  of  the  basic  block  labeled  h  is  a 
jmp  instruction.  The  target  label  is  in  a  register  since  variable  V\^  is  in  a  register. 
This  target  label  is  actually  the  return  address  to  which  it  is  safe  to  jump  as  we 
specify  in  the  initial  precondition  <^o-  Thus  we  conclude  that  label  R  is  also  safe. 
By  induction  on  the  number  k ,  we  conclude  that  the  program  is  safe  for  executing 
any  number  of  steps. 

Building  a  semantic  model  for  a  large  calculus  such  as  LTAL  and  proving  its 
soundness  are  rather  intricate.  Interested  readers  should  refer  to  several  papers 
and  PhD  theses  [Appel  and  Felty,  2000;  Appel  and  McAllester,  2001;  Ahmed  et  al., 
2002;  Ahmed,  2004;  Swadi,  2003;  Tan  et  al.,  2004],  Appel  and  Felty  [2000]  and 
Appel  and  McAllester  [2001]  present  an  (indexed)  semantic  model  of  types;  Ahmed 
et  al.  [2002]  and  [Ahmed,  2004]  extend  the  model  for  general  reference  types;  Swadi 
[2003]  introduces  Typed  Machine  Language  (TML)  and  builds  an  abstraction  layer 
on  which  the  semantic  model  of  LTAL  is  based;  Tan  et  al.  [2004]  and  Tan  [2005] 
give  a  more  detailed  description  of  the  semantic  models  of  machine  instructions  and 
basic  blocks  in  LTAL. 


4.7  Implementation 

Proofs  are  written  and  machine-checked  in  the  theorem-proving  system — Twelf 
[Pfenning  and  Schurmann,  1999,  2002]  and  Flit  [Appel  et  al.,  2002;  Wu  et  al.,  2003]. 
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Currently,  we  have  1865  lines  of  axioms  of  the  logic,  arithmetic,  the  specification 
of  the  SPARC  machine,  and  the  safety  specification;  and  about  165k  lines  of  proof 
in  total  including  lemmas  of  logic,  arithmetic,  sets,  lists,  conventions  of  machine 
states,  semantic  model  of  types,  machines  instructions,  and  the  LTAL  calculus. 

We  have  built  many  layers  of  abstraction  to  make  the  large  proof  implementation 
as  modular  as  possible.  Abstract  modules  such  as  mathematic  sets,  lists,  and  theory 
of  arithmetic  are  developed.  These  modules  are  all  based  on  higher-order  logic 
implemented  in  LF.  We  have  also  implemented  conventions  of  machine  states  and 
semantic  model  of  types  and  machine  instructions.  Among  these  abstractions,  two 
significant  ones  are  TML  and  LTAL.  TML  is  an  expressive  typed  calculus  for 
proving  properties  of  low-level  programs  such  as  machine  code,  but  it  does  not 
have  an  efficient  type  checking  algorithm  since  the  type  system  is  too  expressive 
to  have  decidable  type  checking  algorithm.  LTAL,  however,  has  a  syntax-directed 
type  checking  algorithm  as  we  presented  in  Chapter  3.  LTAL  is  also  designed  for 
checking  low-level  programs,  but  admits  efficient  type  checking.  TML  is  useful  for 
building  semantic  models  of  low-level  calculus  such  as  LTAL  and  for  proving  its 
soundness,  while  LTAL  is  efficient  enough  to  be  used  as  the  interface  between  the 
compiler  and  the  proof  checker. 

Table  4.2  presents  the  breakdown  of  the  proofs  in  our  system  according  to  ab¬ 
stractions  and  modules.  The  first  column  is  modules.  The  number  in  the  second 
column  is  the  lines  of  code  including  comments  and  blank  lines.  The  number  of  lines 
of  code  in  the  third  column  does  not  include  comments  and  blank  lines.  In  total,  we 
have  approximately  143.4k  lines  of  code,  not  including  comments  and  blank  lines. 
In  particular,  the  syntactic  implementation  of  LTAL  in  LF  is  about  4,160  lines  of 
code.  These  are  constructors  declarations  and  Prolog-like  clauses  that  encode  the 
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Modules 

Lines 

Useful  Lines 

Safety  specification 

2,939 

1,9573 

Logic,  arithmetic,  &  algebra 

7,744 

5,241 

Sets,  relations,  &  partial  functions 

9,229 

8,086 

Lists,  vectors,  trees 

12,338 

11,024 

Miscellaneous 

6,239 

5,522 

Machine  conventions 

25,890 

22,321 

Machine  states 

4,014 

3,500 

Abstract  machine  instructions 

9,358 

8,294 

TML 

54,800 

48,124 

LTAL 

34,378 

29,345 

LTAL  (syntax  only) 

6,641 

4,163 

Total 

166,929 

143,414 

Table  4.2:  FPCC  proof  statistics. 


LTAL  type  checking  rules.  The  breakdown  of  these  constructors  and  rules  is  shown 
in  Table  3.1. 


4.8  Related  Work 

The  semantic  approach  to  proving  the  soundness  of  logical  and  type  systems  has 
been  around  for  decades.  Schmidt  [1986],  Gordon  [1988],  and  Wahab  [1998]  prove 
the  soundness  of  Hoare  logic  based  on  the  denotational  semantics.  Such  verification 
has  been  mechanized  in  HOL  [Gordon,  1988].  Loop  invariants  are  specified  in  first- 
order  or  higher-order  logic  and  cannot  be  derived  automatically,  so  the  approach 
does  not  scale  to  large  programs. 

Appel  and  Fclty  [2000]  apply  the  semantic  approach  to  PCC  and  construct  a 
semantic  model  to  types  and  machine  instructions  in  higher-order  logic,  and  proved 

3This  number  includes  the  axioms  for  floating  point  number  arithmetic,  while  the  number 
presented  in  Table  4.1  and  reported  by  Appel  et  al.  [2003]  does  not. 


CHAPTER  4.  MACHINE-CHECKABLE  SOUNDNESS  PROOFS 


91 


soundness  by  proving  the  typing  rules  as  lemmas.  This  semantic  model  has  been  ex¬ 
tended  to  include  general  recursive  types  [Appel  and  McAllester,  2001]  and  mutable 
references  [Ahmed  et  ah,  2002], 

Hamid  et  ah  [2002]  and  Crary  [2003]  follow  the  syntactic  approach  to  prove  type 
soundness.  The  syntactic  approach  has  two  stages.  First,  a  typed  assembly  language 
is  designed  and  its  operational  semantics  is  specified  on  top  of  an  abstract  machine. 
Then  the  syntactic  type-soundness  theorems  are  proved  on  this  abstract  machine 
following  the  scheme  presented  by  Wright  and  Fellcisen  [1994],  At  the  second  stage, 
they  use  a  relation  to  simulate  the  operations  between  the  typed  abstract  machine 
and  the  untyped  concrete  architecture. 


Chapter  5 


Foundational  Proof  Checking  with 
Small  Witnesses 


Proof  checkers  for  proof-carrying  code  (and  similar  systems)  can  suffer  from  two 
problems:  huge  proof  witnesses  and  untrustworthy  proof  rules.  No  previous  design 
has  addressed  both  of  these  problems  simultaneously.  In  this  chapter,  we  show 
the  theory,  design,  and  implementation  of  a  proof-checker  that  permits  small  proof 
witnesses  and  machine-checkable  proofs  of  the  soundness  of  the  system. 


5.1  Introduction 

In  a  proof-carrying  code  system  [Necula,  1997],  or  in  other  proof- carrying  applica¬ 
tions  [Appel  and  Felten,  1999],  an  untrusted  prover  must  convince  a  trusted  checker 
of  the  validity  of  a  theorem  by  sending  a  proof.  Two  of  the  potential  problems  with 
this  approach  are  that  the  proofs  might  be  too  large,  and  that  the  checker  might  not 
be  trustworthy.  Each  of  these  problems  has  been  solved  separately;  in  this  chapter 
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we  show  how  to  solve  them  simultaneously.  The  general  approach  is  to  write  a  logic 
program  that  has  a  machine-checked  semantic  correctness  proof.  The  logic  program 
encodes  the  proof  inference  system.  This  technique  can  be  used  in  other  domains 
(besides  “proof-carrying” )  to  write  logic  programs  with  machine-checked  guarantees 
of  correctness. 

5.1.1  Small  proof  witnesses 

Necula  has  a  series  of  results  on  reducing  proof  size  [Necula  and  Lee,  1998;  Necula 
and  Rahul,  2001].  He  represents  logics,  theorems,  and  proofs  in  the  LF  logical 
framework  [Harper  et  ah,  1993].  But  the  natural  representation  of  an  LF  proof 
contains  redundancy  (common  subexpressions)  that  can  cause  exponential  blowup 
if  the  proofs  are  written  in  the  usual  textual  representation.  Necula’s  LF*  data 
structure  [Necula  and  Lee,  1998]  eliminated  most  of  this  redundancy,  leading  to 
reasonable-sized  proof  terms. 

In  the  PCC  framework,  given  a  machine-language  program,  the  proof  is  of  a 
theorem  that  the  program  obeys  some  safety  property.  It’s  natural  to  compare  the 
size  of  the  representation  of  the  proof  witness  to  the  size  of  the  binary  machine- 
language  program.  Necula’s  LF;  proof  witnesses  were  about  4  times  as  big  as  the 
programs  whose  properties  they  proved. 

Pfenning’s  Elf  and  Twclf  systems  [Pfenning,  1994;  Pfenning  and  Schurmann, 
1999]  are  implementations  of  the  LF  logical  framework.  In  these  systems,  proof- 
search  engines  can  be  represented  as  logic  programs,  much  like  (dependently  typed, 
higher-order)  Prolog.  Elf  and  Twelf  can  build  proof  witnesses  automatically  if  the 
rule  set  is  encoded  as  a  logic  program.  If  each  logic-program  clause  is  viewed  as  an 
inference  rule,  then  the  proof  witness  is  a  trace  of  the  successful  goals  and  subgoals 
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executed  by  the  logic  program.  That  is,  the  proof  witness  is  a  tree  whose  nodes 
are  the  names  of  clauses  and  whose  subtrees  correspond  to  the  subgoals  of  these 
clauses. 

Necula’s  theorem  provers  were  written  in  this  style,  originally  in  Elf  and  later  in  a 
logic-programming  engine  that  he  built  himself.  In  later  work,  he  moved  the  prover 
clauses  into  the  trusted  checker.  In  principle,  proof  witnesses  for  such  a  system  can 
be  just  a  single  bit,  meaning,  “A  proof  exists:  search  and  ye  shall  find  it.”  However, 
to  guarantee  that  proof-search  time  (in  the  trusted  checker)  would  be  small,  Necula 
invented  oracle-based  checking  [Necula  and  Rahul,  2001]:  The  untrusted  prover 
would  record  a  sequence  of  bits  that  recorded  which  subgoals  failed  (and  therefore, 
where  backtracking  was  required).  This  bitstream  serves  as  an  “oracle”  that  the 
trusted  checker  can  use  to  avoid  backtracking.  The  oracle  bitstream  need  not  be 
trusted;  if  it  is  wrong,  then  the  trusted  checker  will  choose  the  wrong  clauses  to 
satisfy  subgoals,  and  will  fail  to  find  a  proof. 

Using  oracle-based  checking,  the  proof  witness  (the  oracle  bitstream)  is  about 
1/8  the  size  of  the  machine  code.1  The  key  idea  is  to  run  a  simple  Prolog  engine 
in  the  trusted  proof  checker;  the  oracle  is  just  an  optimization  to  ensure  that  the 
checker  doesn’t  run  for  too  long. 


1  Unfortunately,  this  statistic  is  somewhat  misleading.  A  “pure”  PCC  system  would  transmit 
two  components  from  an  untrusted  code  producer  to  a  code  consumer:  a  machine- language  pro¬ 
gram  and  a  proof  witness.  The  Special J  proof-carrying  Java  system  on  which  Necula  measured 
oracle-based  checking  transmits  three  components:  the  machine  code,  the  proof,  and  a  Java  “class 
file”.  The  Java  class  file,  as  is  usual  in  any  Java  system,  contains  descriptions  of  the  types  of  all 
procedures  (methods)  in  the  program,  including  formal  parameter  and  result  types.  These  method 
types  help  guide  the  proof  search.  However,  the  “1/8  size”  figure  does  not  include  the  Java  class 
files.  In  our  FPCC/ML  system  system,  all  auxiliary  type  information  needed  by  the  checker  is 
contained  within  the  LTAL  expressions  whose  size  we  report. 
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5.1.2  Trustworthy  checkers 

Necula’s  oraclc-based  checker  for  PCC  comprises  approximately  26,000  lines  of  code: 

23,000  Verification-condition  generator,  written  in  C 
1,400  LF  proof  checker,  written  in  C 
800  Oracle-based  Prolog  interpreter,  in  C 
700  Axioms  for  type  system,  written  in  LF 
26,000  Total  trusted  lines  of  code 

The  largest  component  is  the  verification  condition  generator  (VC-Gen),  which 
traverses  the  machine-language  program  and  extracts  a  formula  in  logic,  the  verifi¬ 
cation  condition,  which  is  true  only  if  the  program  obeys  a  given  safety  policy. 

This  26,000  lines  forms  the  trusted  code  base  (TCB)  of  the  system:  Any  bug 
in  the  TCB  may  cause  an  unsafe  program  to  be  accepted.  The  large  VC-Gen 
component  is  a  concern,  but  so  are  the  axioms  of  the  type  system:  If  the  type 
system  is  not  sound,  then  unsafe  programs  will  be  accepted.  League  et  al.  [2003] 
have  shown  that  one  of  the  SpecialJ  typing  rules  is  unsound. 

The  goal  of  our  research  [Appel,  2001]  is  to  check  proofs  of  program  safety  using 
a  much  smaller  TCB.  We  do  this  by  eliminating  the  VC-Gen  component — we  reason 
directly  about  machine  code  in  higher-order  logic,  instead  of  the  two-step  process  of 
extracting  the  verification  condition  and  then  proving  it;  and  we  write  the  rules  of 
our  type  system  as  machine-checkable  lemmas,  instead  of  axioms.  We  have  shown 
that  the  TCB  for  a  proof-carrying  code  system  can  be  reduced  below  2700  lines,  as 
follows  [Appel  et  al.,  2002]: 
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803  LF  proof  checker,  written  in  C 
135  Axioms  &  definitions  of  higher-order  logic,  in  LF 
160  Axioms  &  definitions  for  arithmetic,  in  LF 
460  Specification  of  SPARC  instruction  encodings,  in  LF 
1,005  Specification  of  SPARC  instruction  semantics,  in  LF 
105  Specification  of  safety  predicate,  in  LF 
2,668  Total  trusted  lines  of  code 

Unfortunately,  in  this  prototype  system  the  proof  witnesses  are  huge:  The  DAG 
representation  of  a  safety  proof  of  a  program  might  be  1000  times  as  large  as  the 
program.  Proof  size  is  approximately  linear  in  the  size  of  the  program,2  so  this 
factor  of  1000  will  not  grow  substantially  worse  for  larger  programs.  However, 
while  this  early  prototype  is  useful  in  showing  how  small  the  TCB  can  be  made,  it 
is  impractical  for  real  applications  because  the  proof  witnesses  are  too  big. 

5.1.3  Synthesis 

We  will  show  that  Necula’s  insight  (run  a  resource-limited  Prolog  engine  in  the 
trusted  checker)  can  be  combined  with  our  paranoia  (don’t  trust  the  logic  pro¬ 
gramming  rules  used  by  such  a  Prolog  engine)  to  make  a  PCC  checker  with  small 
witnesses  and  a  small  trusted  base. 

Our  approach  is  as  follows.  We  write  a  type-checking  algorithm  in  a  subset  of 
Prolog  with  no  backtracking  and  with  efficiently  indexed  dynamic  atomic  clauses. 

2  Technically,  proof  size  is  roughly  proportional  to  the  size  of  the  program  multiplied  by  the 
average  number  of  live  variables  on  entry  to  a  basic  block;  this  is  superlinear  but  much  less  than 
quadratic,  for  typical  programs. 
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We  show  that  the  operators  of  such  a  Prolog  program  can  be  given  a  semantics 
in  higher-order  logic,  such  that  the  soundness  of  each  clause  can  be  proved  as  a 
machine- checkable  lemma.  We  show  that  this  Prolog  subset  is  adequate  for  writing 
efficient  type-checkers  for  PCC  and  for  other  “proof-carrying”  applications. 

Our  trusted  checker  is  sent  the  Prolog  clauses,  with  machine-checkable  soundness 
proofs;  it  checks  these  proofs  before  installing  the  clauses.  Then  it  is  sent  a  theorem 
to  check  (i.e.,  in  a  PCC  application,  the  safety  of  a  particular  machine- language 
program)  and  a  small  proof  witness.  The  Prolog  program  traverses  the  theorem 
and  proof  witness;  this  traversal  succeeds  only  if  the  theorem  is  valid. 

The  TCB  size  of  our  new  checker  is  3034  lines  of  code,  only  366  lines  larger 
than  our  previous  prototype.  It  mainly  includes  all  the  components  of  our  previous 
system  (2668  lines)  plus  a  concise  implementation  of  an  interpreter  (282  lines  of  C 
code)  for  our  Prolog  subset. 

5.2  Semantic  Proofs  of  Horn  Clauses 

We  will  illustrate  our  approach  using  an  example — a  type  checker  for  a  very  simple 
programming  language.  In  this  example  we  illustrate  the  following  points,  which 
are  common  to  many  proof-carrying  applications: 

•  The  specification  of  the  theorem  to  be  proved  is  quite  simple  (in  this  case, 
that  the  program  evaluates  to  an  even  number). 

•  The  proof  technique  involves  the  definition  of  a  carefully  designed  set  of  pred¬ 
icates  that  allow  a  simple,  syntax-directed  decision  procedure  (in  this  case, 
we  define  a  syntax- directed  type  system  for  evenness  and  oddness/ 
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T 

::  = 

even  odd 

decl 

d 

::  = 

•  let  x  =  e;  d 

expr 

e 

::  = 

x  \  n  \  e  i  +  e2 

prog 

p 

;;  = 

(d’,e) 

Figure  5.1:  Syntax  of  even-odd  system. 

•  The  syntax-directed  rules  are  provable,  from  the  definitions  of  the  operators, 
as  machine-checkable  lemmas  in  the  underlying  higher-order  logic  (this  is  what 
foundational  means:  The  rules  are  provable  from  the  foundations  of  logic). 

•  The  syntax-directed  rules  require  management  of  a  symbol  table,  or  context, 
that  would  lead  to  a  quadratic  algorithm  if  implemented  naively;  we  want  a 
linear-time  prover,  and  we’ll  show  how  to  make  one. 

•  The  language  being  type  checked  in  a  proof-carrying  code  system  (or  in  proof¬ 
carrying  authentication)  is  the  output  of  another  program — the  compiler  (or 
a  prover).  Such  languages  don’t  need  all  of  the  syntactic  sugar  that  human- 
readable  languages  have,  and  processing  them  is  therefore  easier. 

5.2.1  Example:  even- valued  expressions 

Consider  a  simple  calculus  for  expressions  with  constants,  variables,  addition,  and 
let-binding,  as  shown  in  Figure  5.1. 

A  program  consists  of  a  list  of  declarations  and  an  expression.  An  expression 
is  either  a  variable,  a  natural  number,  or  the  sum  of  two  expressions.  Here  is  an 
example: 

let  x  —  4  ;  let  y  —  x  +  8  ;  x  +  y 

There  are  two  declarations  followed  by  an  expression;  the  program  evaluates  to  16. 
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Var  —  Num 

State  =  Var  — >  Num  — >  Form 

Decl  =  State  — ►  Form 

Exp  =  State  — >  Num  — >  Form 

Program  =  (Decl,  Exp ) 

(dr,  e )  =  (d,  e) 

=  As.  true 

let  x  =  e;  d  =  As.  d  s  A  (Va.e  s  a  =>  s  x  a) 
x  =  As.Aa.  s  x  a 

n  =  As.Aa.  a  =  n 

ei  e2  —  As.Aa.  3ci.3a2.  61  s  ai  A  62  s  02  A  a  —  a^  plus  02 
safe  Ap.  Vs.  fst(p)  s  =>  3a.  snd(p)  s  a  A  isEven(a) 

Figure  5.2:  Safety  specification. 

5.2.2  Safety  specification 

In  this  simple  example,  we  define  that  a  “safe”  program  is  one  that  evaluates  to 
an  even  number.  In  order  to  define  the  safety  theorem,  we  need  to  know  what  a 
program  means  and  how  to  evaluate  a  program.  The  safety  predicate,  along  with 
a  conventional  denotational  semantics  of  the  language  in  consideration,  is  shown  in 
Figure  5.2. 

All  of  these  definitions  are  treated  as  axiomatic  by  our  checker;  that  is,  they 
are  trusted.  We  have  predefined  types  Num  for  numbers  and  Form,  for  formulas  (or 
propositions).  Variables  are  represented  as  numbers.  An  abstract  machine  State 
maps  a  variable  to  its  content,  i.e.  a  number.  A  program  is  a  pair  of  a  declara¬ 
tion  and  an  expression;  its  semantics  is  the  pair  of  semantics  of  the  corresponding 
declaration  and  expression.3  Declaration  Decl  is  a  predicate  on  states.  Expression 

3An  alternative  denotation  for  a  program  is  a  number,  resulting  from  applying  the  state  after 
the  declaration  to  the  expression. 
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Exp  is  a  predicate  on  a  state  and  a  number;  that  is,  given  a  state  the  expression 
evaluates  to  a  number.  The  semantics  of  concrete  expressions  is  straightforward 
from  definitions. 

Finally,  the  safety  theorem  is  based  on  the  semantics  of  language  constructs. 
Given  a  program  p,  it  is  “safe”  if:  For  any  states  s,  if  the  declaration  of  the  program, 
i.e.  fst(p),  holds  on  s,  then  there  exists  a  number  a  such  that  the  expression  of  the 
program,  i.e.  snd(p),  evaluates  to  a  and  a  is  even. 

5.2.3  Type  checker 

The  typing  rules  appear  in  Figure  5.3.  There  are  three  kinds  of  typing  judgements. 
The  judgement  for  a  program  hp  checks  that  the  program  evaluates  to  a  number 
whose  type  is  r.  The  declaration  judgement  \~d  states  that,  assuming  the  environ¬ 
ment  built  so  far,  and  assuming  the  remaining  declarations  hold,  the  expression  has 
a  certain  type.  The  expression  judgement  he  asserts  that  an  expression  has  certain 
type  under  typing  context  T. 

These  typing  rules  can  be  read  as  a  Prolog-like  logic  program.  Each  rule  is  a 
clause  of  the  logic  program.  The  conclusion  of  a  rule  is  the  head  of  the  clause,  and 
each  premise  of  the  rule  is  a  subgoal.  The  typing  rules  are  designed  such  that  the 
conclusions  of  these  typing  rules  are  disjoint.  Therefore,  when  running  the  type 
checker  (as  a  logic  program)  there  is  no  need  to  backtrack;  we  say  that  such  a  type 
system  is  syntax- directed. 

Furthermore,  if  we  give  denotational  semantics  expressed  in  higher-order  logic  to 
typing  judgements  such  as  hp,  hd,  and  he,  each  typing  rule  can  be  proved  as  a  lemma 

4For  our  PCC  application,  there  are  only  two  language  constructs  for  the  machine  code  to  be 
proved  safe.  The  machine  code  is  a  sequence  of  integers  encoding  machine  instructions;  so  we  only 
need  cons  and  nil. 
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h„  p  :  even  •  |-rf  (d: e)  :  r 

SafeTy  ’  /  _  ProgTy 


safe(p )  h  p(d;e):r 

r  he  ei  :  T\  r[x  :  n]  \~d  (d)  e)  :  r 


T  h d  (let  x  =  ep,  d;e):T 


DeclConsTy 


r  I ~ee\T  .  ,  r(x)  =  r 

DeclNilTy  — — —  VarTy 


T  Ld  (-;e)  :  r  T  be  x  :  r 

r  he  ei  :  n  r  he  e2  :  r2  n  ffl  r2  =  r 
r  he  ei  +  e2  :  T 

fflee 


even  ffl  even  =  even 


even  53  odd  =  odd 


53  eo 


odd  S3  odd  =  even 


odd  53  even  =  odd 


PlusTy 
53  oo 


53  oe 


Figure  5.3:  Typing  rules  with  static  context. 


in  the  system,  thus  its  soundness  is  guaranteed  with  respect  to  the  foundations  of 
logic.  The  denotational  semantics  of  typing  judgements  is  given  in  Figure  5.4. 
Proofs  of  the  typing  rules  are  quite  straightforward  and  thus  omitted  here.  The 
denotational  semantics  of  the  type  operators  are  part  of  the  safety  proof,  not  part 
of  the  safety  specification.  That  is,  they  are  not  trusted,  ft  is  straightforward  to 
prove  the  safety  theorem  from  the  conclusion  of  type  checking  rule  ProgTy  if  we 
pass  r  even  when  invoking  the  type  checker,  as  shown  in  the  SafeTy  rule. 

Our  checker  will  determine  the  validity  of  the  safety  predicate  by  determining 
whether  a  proof  exists,  ft  will  not  construct  such  a  proof  as  a  data  structure;  instead, 
it  will  traverse  a  trace  of  such  a  proof,  composing  lemmas  in  a  syntax-directed  way. 
We  call  our  set  of  lemmas  a  type  system:  Our  machine-checked  safety  proof  of  a 
program  P  consists  of  (1)  a  proof  of  soundness  for  the  type  system,  and  (2)  the 
successful  syntax-directed  execution  of  the  typing  clauses  as  applied  to  P. 
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Num  — >  Form 
State  — >  Form 
Ax.dn.  islnt(n)  A  x  —  2n 
Xx.3n.  islnt(n)  A  x  —  2n  +  1 

Vs.  fst(p)  s  =>■  3a.  snd(p )  s  a  At  a 
Vs.  (d  s  A  T  s)  =>■  3a.  (e  s  a  A  r  a) 

Vs.  T  s4  3a.  (e  s  a  At  a) 

Vni.Vn2.  Ti  7ii  r2  n2  r  (ni  +  n2) 

As.  T  s  A  3a.  s  x  a  At  a 
Vs.  T  s4  3a.  s  x  a  At  a 

Figure  5.4:  Definitions  of  types  and  judgements. 

Efficiency  and  proof  size  problem.  When  type  checking  a  program,  we  build 
a  type  environment,  or  context,  from  the  declarations  for  variables  that  appear 
in  the  expression.  The  rules  for  traversing  a  list  of  declarations  and  building  the 
corresponding  type  contexts  are  DeclConsTy  and  DeclNilTy.  When  a  variable  is 
encountered,  we  look  up  its  type  in  the  context.  However,  the  typing  rule  VarTy 
does  not  specify  a  context  lookup  algorithm.  Consider  the  following  variable  type- 
lookup  rules. 


Ty 

Env 

even 

odd 


def 

def 

def 

def 


Lpp\T 
V  \-d  (d;  e)  :  r 
r  he  e  :  t 
Ti  EEI  r2  =  r 
r[x  :  r] 

T(x)  =  t 


def 

def 

def 

def 

def 

def 


r[rc  :  r]  h  x  :  r 


VarTy  Hit 


r  h  x  :  r  x  f  y 
T[y  :  t'}  h  x  :  r 


VarTy  Miss 


Suppose  the  context  is  simply  organized  as  a  list  in  these  two  rules;  each  element 
of  the  list  is  a  pair:  a  variable  and  its  type.  Then  each  context  lookup  takes  linear 
time,  and  type-checking  a  whole  program  will  take  quadratic  time.  Correspondingly, 
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the  size  of  the  generated  proof  for  a  lookup  operation  is  linear  with  respect  to  the 
size  of  the  context,  and  thus  the  safety  proof  (and  also  the  proof  checking  time) 
for  a  program  has  a  quadratic  blowup.  Our  experiment  with  the  even-odd  example 
shows  that  naive  implementation  cannot  check  even  medium-size  program,  while 
the  efficient  algorithm,  which  will  be  described  in  the  next  section,  scales  very  well. 
This  algorithm  still  has  a  provably  sound  semantic  model,  but  generates  concise 
proofs  and  admits  efficient  proof  checking. 

5.3  Effective  Context  Management 

As  we  have  explained,  we  avoid  sending  large  proofs  to  the  trusted  checker  by 
sending  a  proof  scheme  with  a  soundness  proof  for  the  proof  scheme.  We  want  the 
proof  scheme  to  “execute”  efficiently,  that  is,  in  linear  time  with  respect  to  the  size 
of  the  program-safety-theorem  being  proved.  And  we  want  the  proof  schemes  to 
be  written  in  the  “smallest  possible”  Prolog-like  language:  What  set  of  language 
features  are  useful? 

Here  we  will  show  an  efficient  proof  scheme  for  contexts;  because  this  scheme 
requires  dynamic  clauses  in  the  Prolog  subset,  we  have  included  a  limited  form  of 
dynamic  clauses  in  our  language  design. 

5.3.1  Dynamic  clauses  and  local  assumptions 

Many  logic  programming  systems  provide  a  facility  for  managing  dynamic  clauses 
at  run  time.  In  Prolog,  users  can  assert  a  fact  or  clause  into  database  or  retract 
a  clause  dynamically.  The  assert /retract  mechanism  can  be  expensive  if  the  dy¬ 
namic  clause  in  consideration  is  not  atomic  (i.e.,  has  subgoals)  because  the  dynamic 
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clause  has  to  be  compiled  and  integrated  into  the  program’s  decision  trees.  If  the 
dynamic  clause  is  atomic,  with  input-mode  arguments  that  are  integers  or  hashable, 
the  assert /retract  operation  can  be  cheap:  Prolog  systems  usually  provide  efficient 
support  for  asserting  and  retracting  an  atomic  clause  by  using  hash  tables.  That  is, 
asserting,  retracting,  and  querying  indexable  atomic  clauses  can  be  done  in  constant 
time  per  operation. 

In  the  LF  logical  framework  [Harper  et  al.,  1993],  or  its  implementation  Twelf 
[Pfenning,  1991;  Pfenning  and  Schurmann,  1999],  one  can  use  local  assumptions 
[Pfenning  and  Schurmann,  2002]  to  check  dynamic  clauses  into  database.  Since  these 
assumptions  are  local,  their  dynamic  scopes  control  their  lifetimes;  there  is  no  need 
to  provide  an  explicit  retract  mechanism.  A  clause  of  the  form  {a:  :  r}  A  x  — >  B  x 
introduces  a  local  assumption  A  x  into  the  context  and  then  solves  the  goal  B  x 
under  this  assumption.5  When  proof  search  on  goal  B  has  finished,  assumption  A 
is  automatically  retracted.  That  is,  Twelf  uses  a  dynamically  well-scoped  version  of 
assert /retract.  One  can  use  Prolog  assert /retract  mechanism  to  simulate  Twelf’s  lo¬ 
cal  assumptions,  however.  We  can  give  semantics  to  local  assumptions  and  generate 
concise  proofs  so  that  clauses  are  guaranteed  to  be  correct. 

Local  assumptions  are  particularly  effective — efficient,  secure  (with  a  provably 
sound  model),  and  concise — when  we  need  to  deal  with  big  environments  and  gen¬ 
erate  proofs  of  lookups  in  these  environments. 

5.3.2  Typing  rules 

In  this  subsection,  we  present  an  efficient  type  checking  algorithm  for  environment 
management  using  dynamic  clauses.  The  semantics  is  presented  in  the  next  sub- 
5It  is  a  dependent  type  on  local  parameter  x. 
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p  :  even 


SafeTy 


d  b d  (dr,  e)  :  r 


safe(p )  bp  (d;  e)  :  r 

r  be  ei  :  Ti  bind(x,  T\,  T)  — >  Y  bd  (d;  e)  :  r 


T  brf  (let  x  =  ei ;  d;  e)  :  r 


ProgTy 

BindTy 


T  K  e  :  r 


BindNil 


bind(x,  Ti,  T) 


^  |  /  \  -IS  O  I  UlAjJ.  V  UV  | 

b  bd  e)  :  r  1  be  x  :  r 

r  be  ei  :  n  r  be  e2  :  r2  t1St2  =  t 
r  be  ei  +  e2  :  r 

fflee 


even  EB  even  =  even 


even  ffl  odd  =  odd 


ffleo 


odd  EB  odd  =  even 


odd  EB  even  =  odd 


VarTy 

PlusTy 
EBoo 


EBoe 


Figure  5.5:  Typing  rules  with  dynamic  context. 


section.  Figure  5.5  shows  the  type  checking  rules  with  a  dynamic  environment 
management  scheme. 

The  rule  ProgTy  calls  a  declaration  checking  rule  and  passes  declaration  d  to 
it.  The  declaration  d  appears  twice  in  the  premise.  The  declaration  checking  rules 
traverse  one  d,  and  the  other  d  is  used  to  pass  the  original  declaration  all  the  way 
to  the  expression  checking  rules. 

The  rule  BindTy  requires  some  explanation.  It  first  checks  that  the  expression 
e\  has  type  Ti,  then  asserts  this  fact  as  a  dynamic  clause  (or  local  assumption) 
bind(x,Ti,  T)  and  continues  type  checking. 

When  type  checking  a  variable  expression,  we  try  rule  VarTy  to  match  the 
previous  checked-in  local  assumptions.  The  lookup  operation  takes  constant  time 
and  the  proof  generated  for  it  is  concise.  The  EB  rules  remain  the  same  as  before. 

In  a  conventional  Prolog  implementation  that  supports  efficient  assert /retract 
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operations  for  atomic  dynamic  clauses  like  bind(x,Ti,Y),  the  type  checking  algo¬ 
rithm  above  is  linear.  Moreover,  it  is  provably  sound  as  we  will  show  next. 


5.3.3  Foundational  semantics  and  proofs 

The  safety  specification  remains  the  same  as  presented  in  Figure  5.2.  The  defini¬ 
tions  of  types  and  typing  judgements  remain  untouched  except  for  and  the  new 
constructor  bind. 


T  \~d  (d;e)  :  r 
bind(x ,  r,  T) 
d\  |Z  d'2 

The  semantics  of  dynamic  clause  bind(x,  r,  T)  is  very  similar  to  that  of  the  static 
binding  operator  T[x  :  r]  and  lookup  operator  T(a;)  =  r.  It  serves  both  purposes. 
From  these  definitions  it  is  straightforward  to  prove  the  typing  rules  as  lemmas  and 
the  safety  theorem  can  be  proved  from  the  successful  type  checking  of  a  program 
from  the  goal  hp  (d;  e)  :  even.  Here  we  give  the  proof  for  rule  BindTy. 

Lemma  5. 3. 3.1  ( BindTy ) 


=  Vs.  (r  IZ  d  A  T  s)  3a.  (e  s  a  A  r  a) 
=  Vs.  r  s=>  3a.  (s  x  a  At  a) 

=  Vs.  di  s  =>  d2  s 


r  he  ei  :  T\  bind(x,  r1?  T)  — >  T  (d;  e)  :  r 
T  \-d  (let  x  =  ei ;  d;  e)  :  r 


BindTy 


PROOF:  By  definition  of  \~d.  for  all  state  s,  we  assume  Y  □  (let  a:  =  e\\  d)  and  Y  s, 
then  we  prove  3a.  (e  s  a  At  a).  This  can  be  obtained  from  Y  (d;  e)  :  r.  In  order 
to  use  this  fact,  we  need  to  prove  the  local  assumption  bind(x,  Ti,  T),  which  can  be 
proved  from  the  premise  Y  he  e\  :  T\  and  the  assumption  Y  jZ  (let  a;  =  e\\  d).  □ 

The  machine-checkable  proof  in  LF  for  this  rule  can  be  found  in  Section  5.6. 
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5.3.4  Dynamic  clauses  in  the  real  LTAL 

In  LTAL,  we  use  dynamic  clauses  to  efficiently  maintain  various  environments  such 
as  label,  register,  and  type  maps.  These  maps  are  shown  in  Figure  3.2  as  LRT.  An 
LTAL  program  is  a  tuple  (L,  R ,  T,  B ),  where  L  is  a  label  map,  R  is  a  register  map, 
T  is  a  type  definition  map,  and  B  is  a  list  of  basic  blocks. 

The  type  checker  for  LTAL  programs  starts  by  parsing  the  LRT  maps  so  that 
later  they  can  be  looked  up  efficiently.  These  maps  could  be  quite  large  for  real-life 
programs,  and  often  the  type  checker  needs  to  look  up  the  value  of  a  label,  the 
register  that  a  variable  is  assigned  to,  or  the  content  of  a  type  definition.  Naive 
implementation,  such  as  sequential  search,  of  parsing  and  lookup  rules  for  various 
environments  is  straightforward,  but  not  efficient.  In  our  actual  implementation,  we 
use  dynamic  clauses  to  efficiently  maintain  various  environments.  The  type  checking 
rules  for  parsing  these  maps  are  shown  in  Figure  5.6. 

The  rule  SafeTheorem  states  that  the  LTAL  type  checking  must  establish  the 
safety  theorem.  The  root  rule  of  the  LTAL  type  checker  is  the  rule  ProgTy  which 
calls  the  label  binding  parsing  rules.  There  two  rules,  BindLabCons  and  BindLabNil, 
for  processing  the  label  environment  L.  The  BindLabCons  rule  matches  if  the 
current  label  environment  in  processing  is  not  empty.  Its  subgoal  has  a  dynamic 
clause  bmdLab(li ,  Gq,  H).  This  dynamic  clause  is  asserted  at  run  time  whenever  the 
rule  BindLabCons  is  matched.  The  dynamic  clauses  have  dynamic  scope,  and  thus 
if  later  the  type  checker  wants  to  look  up  the  value  (which  is  an  address)  of  a  label  /, 
it  can  simply  invoke  a  subgoal  as  bindLab(l,  a,  H )  and  a  will  have  the  value  after  the 
subgoal  completes.  This  lookup  operation  takes  constant  time  since  dynamic  clauses 
are  compiled  into  hash  tables  by  the  the  underlying  logic  programming  system.  The 
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K  PH 

—  SafeTheorem 

ScLI6(^  l  J 

bindLab(li,ai,  H)  ■ 


h  P  (L,  R,  T,  B)  L 
b  pP(L,R,T,B) 

h  P  H  L 


ProgTy 


h  PH  (h^ai,L) 


BindLabCons 


br  P  ( L ,  R,T,B)  R 
h  P  (L,  R,  T,  B)  nil  BmdLabNl1 


bindReg(xi,ri,  H ) 


b  rP  H  R 


b r  P  H  (x\  I— »•  n,  P) 
b*  -P  (I/,  P,  T,  B)  T 


BindRegCons 


br  P  (L,  R,  T,  B)  nil 
bindTy(@i,  ti,  H)  ->  b  tPHT 


—  BindRegNil 

BindTyCons 


btP  H  (g^ruT) 

b6  P  (L,  R,  T,  B)  B 

wpkrm  mi 

L.RT  b  (p;  0;  cc)  {Qi; .  ■  ■ ;  tfc)}  (//;  (/>';  cc') 

b b  P  (I/,  -R,  T,  P)  (/[a  :  £](m,  cc,  0)  =  4; . . . ;  ik) 

tt  BlockNil 


BlockCons 


bt  P  (L,  P,  T,  P)  nil 


Figure  5.6:  LTAL  typing  rules  for  environment  management  with  dynamic  context. 


rule  BindLabNil  matches  if  the  label  environment  is  empty  and  it  calls  the  register 
environment  processing  rules.  The  register  and  type  definition  environments  are 
processed  in  a  similar  way. 

After  processing  these  environments,  the  type  checker  invokes  the  basic  block 
checking  rules  which  simply  call  instruction  checking  rules  for  checking  the  body 
with  preconditions  as  the  current  typing  environments.  Since  the  dynamic  clauses 
are  dynamically  scoped,  and  the  basic  block  checking  rules  and  instruction  checking 


CHAPTER  5.  FOUNDATIONAL  PROOF  CHECKING 


109 


rules  are  invoked  by  the  rule  BindTyNil  directly  and  by  other  environment  pro¬ 
cessing  rules  indirectly,  we  can  efficiently  query  dynamic  clauses  in  the  basic  block 
and  instruction  type  checking  rules.  The  individual  instruction  checking  rules  are 
presented  in  Section  3.7,  3.8,  3.9,  and  Appendix  A. 2. 


5.4  Logic  Programming  Engine 

For  developing  our  semantic  proofs  of  soundness  we  use  Twclf,  a  sophisticated  sys¬ 
tem  with  many  useful  features:  In  addition  to  an  LF  type  checker,  it  contains  a 
type  reconstruction  algorithm  that  permits  users  to  omit  many  explicit  parameters, 
a  proof-search  algorithm  (which  is  like  a  higher-order  Prolog  interpreter),  constraint 
regimes  (e.g.,  linear  programming  over  the  exact  rational  numbers),  mode  analysis 
of  parameters,  a  meta-theorem  prover,  a  pretty-printer,  a  module  system,  a  con¬ 
figuration  system,  an  interactive  Ernacs  mode,  and  more.  We  have  found  many  of 
these  features  useful  in  proof  development,  but  Twelf  is  certainly  not  a  minimal 
proof  checker;  we  would  like  to  avoid  the  need  to  trust  it.  However,  since  Twelf 
does  construct  explicit  proof  objects  internally,  we  can  extract  these  objects  to  send 
to  our  minimal  checker. 

The  previous  section  shows  that  efficient  syntax-directed  type-checking  uses  cer¬ 
tain  logic-programming  constructs  (dynamic  clauses)  but  not  others  (backtracking), 
and  that  each  Horn  clause  can  be  proved  sound  as  a  lemma  in  higher-order  logic. 
This  section  describes  a  suitable  logic  programming  interpreter  implemented  in  Flit, 
our  trusted  LF  proof  checker.  The  logic  programming  engine  is  implemented  by 
Stump  [Wu  et  ah,  2003].  Other  aspects  of  Flit  are  described  in  Appel  et  al.  [2002], 

A  type  checking  lemma  (a  rule  together  with  its  semantic  proof)  is  represented 
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in  LF  as  “name  :  r  =  exp.”  The  type  r  encodes  the  type  checking  rule,  and  exp  is 
a  term  of  type  r.  By  the  Curry-Howard  isomorphism  [Howard,  1980],  term  exp  is  a 
proof  of  the  theorem  that  r  encodes.  The  name  stands  for  the  whole  term  exp  with 
type  r,  i.e.  the  theorem  and  the  proof. 

The  first  step  is  to  check  the  validity  of  the  proof  “exp.”  Our  checker  Flit  includes 
a  simple  LF  checker  which  is  used  to  check  that  exp  has  LF  type  r.  Flit  LF  checker 
is  simpler  than  Twclf  since  Twelf  does  type  inference  while  Flit  does  not.  We  ask 
the  adversary  to  send  explicitly  typed  LF  terms  instead  of  implicitly  typed  terms; 
explicitly  typed  LF  terms  can  be  constructed  by  Twelf’s  type  inference  module. 

After  LF  type  checking,  the  proof  term  “exp”  is  not  useful  anymore.  Flit  runs 
a  simple  logic  programming  engine  to  interpret  the  type  checking  rules  as  a  logic 
program,  which  type  checks  input  machine  programs. 

To  achieve  a  concise  and  efficient  implementation,  we  impose  several  restrictions 
on  the  form  of  goals  and  programs.  If  these  are  violated,  the  interpreter  will  remain 
sound  but  may  fail  to  be  complete.  Specifically,  Flit’s  logic  programming  language 
makes  the  following  assumptions: 

Atomic  dynamic  clauses.  Flit  does  not  allow  non-atomic  dynamic  clauses.  Dy¬ 
namic  clauses  are  mainly  used  to  efficiently  maintain  various  environments. 
For  this  purpose,  atomic  dynamic  clauses  suffice. 

Bounded  execution.  To  avoid  dynamic  memory  allocation  during  the  logic  pro¬ 
gram  execution,  Flit  uses  a  fixed-size  memory  to  run  logic  programs.  The 
only  purpose  of  this  restriction  is  to  simplify  the  logic  programming  engine 
and  thus  simplify  Flit,  the  trusted  checker. 


Determinism.  Every  subgoal  in  the  input  logic  program  is  solved,  if  solvable, 
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by  the  first  matching  clause  in  the  set  of  static  clauses  and  active  dynamic 
clauses.  Note  that  dynamic  clauses  follow  the  dynamic  scoping  rule.  Under 
this  condition  of  determinism,  Flit  does  not  need  backtracking  mechanism. 
And  our  practice  shows  that  backtracking  can  often  be  avoided  during  type 
checking  when  the  type  checker  is  carefully  engineered. 

Bounded  indices.  Let’s  define  the  index  of  a  dynamic  clause  to  be  its  first  argu¬ 
ment.  We  require  indices  of  dynamic  clauses  are  small  natural  numbers  and 
distinct  from  each  other.  This  allows  simple  and  efficient  indexing  of  dynamic 
clauses. 

Prolog  interpreters  typically  enter  atomic  dynamic  clauses  in  hash  table  for  ef¬ 
ficient  matching,  using  one  of  the  predicate’s  arguments  as  the  hash  key.  Our  logic 
programs  can  be  written  with  this  very  restricted  form  of  clause  indexing. 

Example.  The  even-odd  proof  scheme  of  Figure  5.5  is  a  logic  program  that 
conforms  to  these  restrictions.  The  proof  scheme  (1)  executes  in  linear  time  and 
space,  and  it  is  (2)  syntax-directed.  Its  dynamic  clauses  bind(x,  r,  T)  are  all  atomic. 
In  our  implementation  of  this  proof  scheme,  we  put  the  x  argument  of  bind(x,  r,  T) 
in  the  first  position  to  conform  to  the  bounded  indices  rule;  and  all  the  indices  x 
are  manifest  constants  that  are  small  integers.  Our  LTAL  proof  scheme  used  in  the 
real  PCC  system  also  obeys  these  restrictions. 

A  logic  program  is  presented  to  Flit’s  logic  programming  engine  as  a  set  of  LF 
terms,  represented  using  an  expression  data  structure  [Appel  et  ah,  2002],  Flit  first 
transforms  the  logic  clauses  into  a  format  that  is  convenient  for  executing  logic 
programs,  and  then  runs  the  logic  program  [Wu  et  ah,  2003]. 
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5.5  Proof  Witnesses 

Our  even-odd  example  is  overly  simplistic  in  that  there  is  a  syntax-directed  decision 
procedure  for  the  main  safety  theorem:  For  an  expression  E.  if  the  formula  safe(E) 
is  true,  then  the  proof  is  easily  found.  In  a  real  proof-carrying  code  application, 
the  program  E  is  in  machine  language;  loops  and  recursion  in  the  program,  and 
quantified  types  in  the  type  system,  make  type  inference  impossible. 

Thus,  in  a  PCC  application,  the  input  to  the  prover  includes  the  program  E  and 
also  an  untrusted  hint  H .  The  hint  provides  loop  invariants,  type  annotations,  and 
other  information  which  can  be  used  by  the  prover.  Because  the  hint  is  provided 
by  the  same  adversary  who  provides  the  program,  H  cannot  be  assumed  accurate, 
but  it  can  still  be  useful  in  constructing  the  proof. 

We  will  illustrate  using  the  even-odd  example.  Let  us  provide  a  hint  H  which 
is  a  list  of  type  annotations,  x\  :  Ti,  x2  :  r2, . . . ,  xn  :  rn.  We  will  write  a  prover  that 
uses  this  hint  (even  though  for  this  simple  language  the  hint  is  not  necessary).  The 
root  goal  is  now  b p  H  E  instead  of  safe(E).e 

In  addition  to  running  the  logic  program  on  the  root  query  bp  H  E,  the  checker 
verifies  a  (static)  proof  of  the  lemma, 

h p  HE 
safe(E) 

We  can’t  use  this  as  a  logic-programming  rule,  i.e.  we  can’t  use  safe(E)  as  our 
query,  because  then  the  logic  program  would  have  to  “guess”  H ,  which  could  require 
unbounded  backtracking.  The  hint  H  serves  as  a  proof  witness  for  E,  in  conjunction 
with  the  Prolog  program  (i.e.  proof  scheme)  and  its  semantic  soundness  proof. 


6The  text  representation  of  the  predicate  \~p  is  “judge.prog”  as  we  presented  in  Chapter  2. 
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Axioms 

stage  1 

Trusted! 

Expression  Operators 

Untrusted  ! 

Semantic 

stage  2 

Model 

Hint  Operators 

Proof  scheme 

Clauses 

Theorem  to  be  proved 

Expression 

stage  3 

Proof  witness 

Hint 

Table  5.1:  Layers  of  specification  and  proof. 

5.5.1  Layers  of  specification  and  proof 

To  handle  proof- checking  with  hints,  the  checker  software  must  process  separately 
several  layers  of  specification,  semantics,  proof,  and  logic-programming  clauses.  The 
layers  of  specification  and  proof  are  shown  in  Table  5.1.  It  is  useful  to  think  in  terms 
of  a  proof  consumer  and  an  adversary. 

Stage  1.  The  proof  consumer  specifies  the  Axioms  of  a  logic,  and  defines  the 
kinds  of  theorems  she  wants  to  check — that  is,  the  language  of  expressions  for  which 
she  wants  safety  theorems — by  defining  Expression  Operators.  One  of  the  expres¬ 
sion  operators  must  be  a  predicate  called  safe. 


Stage  2.  Then  the  adversary  sends  a  proof  scheme,  that  is,  a  logic  program 
(the  syntactic  type  checker  in  the  even-odd  example).  This  program  manipulates 
goals  expressed  using  the  Expression  Operators  and  the  Hint  Operators.  All  the 
hint  operators  must  be  defined  in  terms  of  the  underlying  logic — the  adversary  is 
not  permitted  to  add  uninterpreted  operators  to  the  logic.  All  the  Clauses  of  the 
logic  program  must  be  proved  as  derived  lemmas  in  the  logic,  from  the  definitions 
of  the  expression  and  hint  operators,  as  Lemma  5.3.3. 1  does. 
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The  Semantic  Model ,  sent  by  the  adversary,  is  simply  a  set  of  supporting  defi¬ 
nitions  and  lemmas,  defined  in  terms  of  the  underlying  logic,  that  can  be  useful  in 
defining  the  hint  operators  and  the  clauses. 

The  adversary  may  define  as  many  hint  operators  and  clauses  as  he  likes;  how¬ 
ever,  there  must  be  one  operator  called  bp,  and  the  semantic  model  must  contain  a 
lemma  of  the  form, 


bp  H  E 
safe(E) 

The  proof  consumer  uses  the  logical  framework  LF  to  check  the  wellformedness 
of  all  the  definitions  and  the  proofs  of  all  the  lemmas.  Then  she  loads  the  Clauses 
into  the  subset-Prolog  interpreter. 

Stage  3.  Finally,  the  adversary  sends  an  Expression  and  a  Hint.  The  consumer 
needs  to  verify  that  the  expression  obeys  her  desired  safety  property — this  was  the 
point  of  the  whole  exercise! — and  she  will  do  it  using  the  adversary’s  proof  scheme. 
Since  the  proof  scheme  was  proved  sound  (and  she  has  checked  the  proof),  then  if 
the  logic  program  completes  successfully,  then  safe(E)  must  be  valid. 

For  the  even-odd  system,  the  implementation  of  these  stages  is  shown  in  Ta¬ 
ble  5.2;  sample  source  code  written  in  Twelf  is  in  Section  5.6. 

What  is  a  proof  witness?  Stage  1  (loading  axioms  and  safety  predicate)  needs 
to  be  done  only  once  per  safety  policy.  In  a  PCC  application,  stage  2  (loading  the 
proof  scheme)  would  need  to  be  done  when  there  are  substantial  modifications  to 
the  the  untrusted  compiler.  Stage  3  is  repeated  for  each  compiled  program  sent 
from  the  compiler  to  the  consumer.  Clearly,  any  work  done  in  stages  1  and  2  can 
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A  =>  B 
B 

A 

Vx.A(x) 

imp_e  .  .  V_e 

A(B)  et  cetera 

Axioms 

Var 

def 

(Figure  5.2) 

Num 

Expression  Operators 

State 

def 

Var  — >  Num  — >  Form 

Decl 

def 

State  — >  Form 

Exp 

def 

State  — ►  Num  — >  Form 

Prog 

def 

(Decl,  Exp) 

(d;  e) 

def 

(d,e)  •  *=*  As.  true 

let 

def 

Xx.Xe.Xd.  As.  d  s  A  (Va.e  s  a  =>  s  x  a) 

X 

def 

Xs.Xa.  s  x  a  n  d=  Xs.Xa.  a  =  n 

+ 

def 

Aei.Ae2-  Xs.Xa.  3ai.3a2.ei  s  ai  A  e2 

s  a2  A  a  =  a\  plus  02 

safe 

def 

A p.  Vs.  fst(p)  s  =>■  3a.  snd(p)  s  a  A  isEven(a) 

Ty 

def 

(Figure  5.4  and  Section  5.3.3)  Semantic  Model 

def 

Num  — >  Form  Env  =  State  — >  Form 

3! 

def 

X F.  3x.  F  x  A  Wy.  F  y  =>■  x  —  y 

upd 

def 

Xx.Xa.Xs.  Xy.Xb.  if  (x  =  y)  (a  =  b)  (s  y  b) 

even 

def 

Xx3n.  islnt(n)  A  x  =  2n 

odd 

def 

Xx3n.  islnt(n)  A  x  =  2n  +  1 

bp 

def 

Xh.Xp.Xr.  Vs.  fst(p)  s  =>-  3  a.snd(p)  s  a  At  a 

E d 

def 

XT.Xh.Xd.Xe.Xr.  Vs.  (r  C  d  A  T  s)  =>■  3a.  (e  s  a  At  a) 

be 

def 

Ar.Ae.Ar.  Vs.  f  s4>  3a.  (e  s  a  At  a) 

ffl 

def 

Xti.Xt2.Xt.  Vn1.Vn2.Ti  ni  =>■  T2  n2  =>■  r  (ni  +n2) 

bind 

def 

Ax.Ar.Ar.  Vs.  T  s^-  3a.  (s  x  a  At  a) 

c 

def 

Xd\.Xd2.  Vs.  d±  s  =>■  e^2  s 

Ty  Env  even  odd  typeof 


Hint  Operators 


safe(p)  <-  bp  p  :  even.  (Figure  5.5) 

bp  (d;  e)  :  r  <—  d  b d  ( d ;  e)  :  r. 
r  b^  (typeof  re  :  ri ;  /i)  ||  (let  tc  =  e\ ;  d)  ;  e  :  r 
r  be  ei  :  n  <— 

(bind(x,  ri,  T)  — >  T  b^  (/i  ||  d  ;  e)  :  r). 

F  bd  (-)]•;  e)  :  r  <-  r  he  e  :  r. 

T  \-e  x  :  r  bind(x,  ri,  T). 
r  be  ei  +  e2  :  r  T  be  e\  :  ri  4—  T  be  e2  :  T2 

even  EBeven  =  even.  even  ffl  odd  =  odd. 

odd  ffl  odd  =  even.  odd  Eleven  =  odd. 


Clauses 


T l  LU  T2  =  T. 


let  x  =  4  ;  let  y  =  x  +  8  ;  a?  +  y 


Expression 


typeof  x  even  (typeof  y  even  •) 


Hint 


Table  5.2:  Proof  scheme  for  even-odd  system.  Not  shown  are  the  proofs  (in  higher- 
order  logic)  of  all  the  clauses. 
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be  amortized  over  many  executions  of  stage  3.  Although  the  foundational  proof 
derives  from  information  transmitted  in  stages  2  and  3,  in  measuring  the  effective 
size  of  proof  witnesses  we  can  consider  just  the  Hint  sent  in  stage  3. 


5.6  Machine  Checkable  Proofs 

To  illustrate  the  format  of  machine-checked  soundness  proofs  of  the  type-checking 
clauses,  here  we  will  show  the  proofs  related  to  the  rule  BindTy  (Lemma  5.3.3. 1). 
Note  this  is  the  version  with  hints  we  described  in  Section  5.5;  the  rule  without 
hints  is  quite  similar. 

Since  the  proof  is  written  in  LF,  we  begin  with  a  brief  introduction  to  LF.  LF  is 
based  on  the  A-calculus  with  dependent  types,  and  it  has  syntactic  entities  at  three 
levels:  objects,  types,  and  kinds.  Types  classify  objects  and  kinds  classify  families 
of  types.  A  deductive  system  is  represented  in  LF  using  the  judgements- as- types 
and  derivations-as-ternrs  principle  [Harper  et  ah,  1993]:  Judgements  (theorems)  are 
represented  as  types,  and  derivations  (proofs)  are  represented  as  terms  whose  type 
is  the  representation  of  the  judgement  (theorem)  that  they  prove.  In  this  way  proof 
checking  of  the  object  logic  is  reduced  to  type  checking  of  the  LF  terms. 

In  general,  a  definition  in  LF  has  the  form:  name  :  r  =  exp.  including  the  dot. 
The  type  r  encodes  the  theorem  to  be  proved,  and  exp  is  a  term  of  type  r.  By  the 
judgements-as- types  and  derivations-as-ternrs  principle,  term  exp  is  a  proof  of  the 
theorem  that  r  encodes.  And  the  name  stands  for  the  whole  term  exp  with  type  r, 
i.e.  the  theorem  and  the  proof.  LF  and  Twelf  also  permit  introducing  constructors 
with  the  form  name  :  r. 

The  entire  machine-checkable  proof  in  LF  is  shown  in  Figure  5.7.  The  notation 
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check_decl_cons : 

|-d  (typeof  V  Tv  HINT)  (let  V  Ev  D)  Gamma  E  T  <- 
I -e  Gamma  Ev  Tv  <- 
(bind  V  Tv  Gamma 

->  |-d  HINT  D  Gamma  E  T)  = 

[pi :  bind  V  Tv  Gamma  ->  / -d  HINT  D  Gamma  E  T] 

[ p2 :  I -e  Gamma  Ev  Tv] 

I ~d_i  [sj 

[ p3 :  pf  (sub_env  @  Gamma  <§  (let  V  Ev  D))I 
[ p4 :  pf  (Gamma  @  s) ] 
cut  (bind_i  [s_v] 

[p7 :  pf  (Gamma  @  s_v) J 
I -e_l  p2  p7  [ a_v ] 

[p5:  pf  (Ev  @  s_v  <3  a_v)] 

[ p6 :  pf  (Tv  @  a_v)I 
cut  (let_el  (sub_env_e  p3  p7)  p5) 

[p8:  pf  (s_v  @  c  V  @  a_v)I 
exists_i  a_v 
(a nd_i  p8  p6)) 

[ plO :  bind  V  Tv  Gamma] 
cut  (sub_env_i  [s’ J 

[pl2:  pf  (Gamma  @  s’)] 
let_e2  (sub_env_e  p3  pl2)) 

[p20:  pf  (sub_env  @  Gamma  @  D) ) 

I -d_e  (pi  plO)  p20  p4. 


Figure  5.7:  Machine-checkable  proof  of  BindTy  in  LF. 

“[x:t]A”  denotes  Ax  :  t.  A.  In  the  proof  above  we  first  introduce  two  A-bindings; 
that  is,  we  assume  that  the  two  premises  of  the  typing  rule  hold.  Then  we  use  the 
/  -d  introduction  rule  /  -d_i  to  get  a  proof  of 

|-d  (typeof  V  Tv  HINT)  (let  V  Ev  D)  Gamma  E  T, 

i.e.  the  conclusion. 

The  rule  I  -d_i  introduces  three  A-bindings:  s,  p3,  and  p4.  Note  that  the  type  of 

s  is  omitted  and  Twelf  will  reconstruct  it  to  a  State  type.  Lemma  cut  is  as  follows: 

cut:  pf  A  ->  (pf  A  ->  pf  B)  ->  pf  B  = 

[pl:pf  A] [p2:pf  A  ->  pf  B]  imp_e  (imp_i  p2)  pi. 
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The  imp_i  and  imp_e  ( modus  ponens )  are  introduction  and  elimination  lemmas 
for  implication.  In  general,  the  lemma  cut  means  if  we  have  a  proof  of  A,  and  a 
function  which  maps  a  proof  of  A  to  a  proof  of  B ,  then  we  can  get  a  proof  of  B. 
This  is  similar  to  imp_e  or  modus  ponens ,  but  cut  uses  LF  function  type  ->  instead 
of  object  implication.  When  using  cut,  we  first  prove  some  formula  A,  then  bind 
this  proof  (give  it  a  name  so  that  we  can  refer  to  it  later)  and  continue  to  prove  the 
goal  (B  in  this  case).  The  @  is  the  object  logic  level  term  application. 

5.7  Scaling  Up  to  Foundational  PCC 

The  even-odd  type  system  is  just  a  toy  example  to  demonstrate  some  of  the  princi¬ 
ples.  Our  real  applications  are  in  proof-carrying  code  and  distributed  authorization. 
Our  checking  system  scales  up  to  these  examples  quite  well,  as  we  will  explain. 

In  our  application  to  foundational  PCC,  the  hint  H  is  an  expression  in  the 
LTAL  calculus  presented  in  Chapter  3,  and  the  expression  E  is  a  machine-language 
program,  that  is,  a  sequence  of  32-bit  natural  numbers. 

Figure  2.1  shows  the  major  components  of  our  foundational  proof-carrying  code 
framework.  The  LTAL  clauses  are  a  set  of  clauses  in  our  restricted  Prolog  subset. 
Axioms  &  Architecture  Spec  are  prcloaded  into  our  Checker  and  must  be  trusted  as 
axioms  and  trusted  definitions.'  Between  these  two  components  are  proofs,  based 
on  the  axioms,  of  all  the  LTAL  clauses. 

A  source  program  is  compiled  into  a  machine-code  program  and  an  LTAL  ex¬ 
pression.  The  compiler  is  not  trusted,  because  it  is  a  large  program  that  may  have 
bugs.  The  trusted  checker  receives  the  LTAL  clauses,  along  with  their  soundness 

'A  trusted  definition  is  one  that  is  used  in  the  statement  of  the  theorem  to  be  proved;  an 
untrusted  definition  is  used  only  in  the  proof. 
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proofs  in  higher-order  logic;  checks  the  soundness  proofs;  and  then  runs  the  LTAL 
checker,  which  is  a  syntax-directed  computation  in  our  subset  Prolog. 

LTAL  is  presented  in  Chapter  3;  the  LTAL  semantic  model  is  briefly  presented 
in  Chapter  4.  In  this  chapter,  we  focus  on  the  aspects  of  the  LTAL  calculus  that 
enable  it  to  be  type-checked  by  our  tiny  trusted  checker. 

Because  a  source-language  programmer  never  sees  the  LTAL  program,  we  can 
design  the  LTAL  calculus  to  be  checkable  in  our  very  restricted  language.  To  use  the 
checker’s  limited  support  for  dynamic  clauses,  we  have  arranged  the  LTAL  so  that: 
All  identifiers  in  LTAL  are  small  integers.  No  variables  have  the  same  identifier. 
Program  labels,  local  variables,  and  type  abbreviations  are  represented  by  disjoint 
sets  of  integers.  To  make  the  LTAL  type  system  entirely  syntax-directed,  we  use 
explicit  coercions  to  guide  the  typing  rules,  instead  of  relying  on  subtyping  which 
would  require  a  search. 

We  use  the  simple  and  limited  arithmetic  provided  by  the  checker:  addition, 
multiplication,  and  truncating  division  on  32-bit  natural  numbers.  Other  operators 
are  synthesized,  such  as  A  >  B  by  div  BA  0,  using  truncating  division. 

The  LTAL  typing  rules,  such  as  the  one  shown  in  Section  3.3,  though  bigger 
and  more  complicated  than  the  rules  we  presented  for  the  even-odd  system,  can  be 
executed  by  our  simple  subset  Prolog  interpreter. 


5.8  Experimental  Results 

We  have  measured  our  trusted  checker  on  the  even-odd  microbenchmark  and  on 


some  small  but  nontrivial  LTAL  benchmarks.  Gross  statistics  about  these  proof 
schemes  are  shown  in  Table  5.3. 
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EvenOdd  LTAL 


Core  Axioms 

341 

341 

lines 

of 

LF 

Application-specific 

10 

1522 

lines 

of 

LF 

Expression  Operators 

40 

2 

lines 

of 

LF 

Semantic  Model 

218 

-100,000 

lines 

of 

LF 

Hint  Operators 

10 

500 

lines 

of 

LF 

Clauses 

12 

3,500 

lines 

of 

LF 

Expression 

-  7  N 

~  21V 

tokens 

Hint 

—  AN 

~  301V 

tokens 

Table  5.3:  Measurements — system  size. 

Lines  of  LF  does  not  include  blank  lines  and  comments.  Expression  sizes  for 
EvenOdd  are  measured  with  N  as  the  number  of  declarations,  each  declaration  of 
the  form  let  Xi  =  Xj  +  Xk\  which  is  7  tokens  per  declaration.  Expression  sizes  for 
LTAL  are  measured  with  N  as  the  number  of  machine  instructions  (32-bit  integers) 
in  the  program  to  be  proved  safe,  with  two  tokens  per  integer,  for  example: 

2551193600  ;  2181292040  ;  2214748172  ;  ...  ;  nil 


From  this  it  should  be  clear  why  LTAL  has  only  two  Expression  Operators ; 
everything  shown  in  Figure  3.2  and  3.16  is  actually  Hint  Operators. 

The  logic  program  is  the  set  of  LTAL  typing  rules.  There  are  several  hundred 
LTAL  clauses  or  typing  rules,  some  of  which  take  dozens  of  lines  to  write  down, 
such  as  the  one  we  showed  in  Section  3.3  for  the  SPARC  add  instruction.  The  LTAL 
semantic  model ,  which  provides  proofs  of  all  these  clauses,  is  rather  intricate  and  is 
the  subject  of  several  other  papers  and  PhD  theses  [Appel  and  Felty,  2000;  Appel 
and  McAllester,  2001;  Ahmed  et  ah,  2002;  Ahmed,  2004;  Swacli,  2003;  Tan  et  al., 
2004], 

Since  the  clauses  are  written  in  a  subset  of  Prolog,  we  can  execute  them  in 
a  standard  Prolog  system.  For  each  benchmark,  we  compare  execution  time  in 


CHAPTER  5.  FOUNDATIONAL  PROOF  CHECKING 


121 


Input  size 

SICStus 

Twelf 

Flit 

EvenOdd 

N  =  100 

0.002 

0.99 

0.01 

N  =  1000 

0.030 

>  3600 

0.05 

N  =  10000 

1.460 

0.26 

LTAL 

N  =  32 

0.005 

1.21 

0.43 

N=  870 

0.183 

1018 

1.32 

N  =  1816 

0.432 

>  3600 

2.19 

Table  5.4:  Measurements — safety  checking  performance. 

the  (highly  optimized)  SICStus  Prolog  compiler  with  execution  time  in  the  Flit 
interpreter.  The  results  are  shown  in  Table  5.4. 

All  times  are  in  seconds  on  a  2.2  GHz  Pentium  4.  Twelf  is  not  designed  for 
performance,  but  its  advanced  features  make  it  a  convenient  tool  for  us  to  develop 
machine- checkable  proofs  in  LF.  Flit  is  faster  than  SICStus  Prolog  for  large  Even- 
Odd  examples;  EvenOdd  is  unrealistic  because  the  Prolog  program  has  only  a  few 
simple  clauses.  Parsing  the  expression  and  hint  contributes  a  significant  portion  of 
execution  time  for  EvenOdd  examples  in  SICStus  Prolog.  And  also,  the  dynamic 
clause  indexing  in  Flit  is  tailored  to  our  specific  applications;  it  could  be  more  effi¬ 
cient  for  our  examples  than  general  purpose  Prolog  systems.  Checking  LTAL,  Flit 
is  about  five  times  slower  than  SICStus;  this  performance  may  be  acceptable  in  the 
intended  application. 

Of  course,  execution  in  SICStus  loses  the  benefits  of  the  tiny  trusted  base:  In 
that  mode  we  don’t  mechanically  connect  the  soundness  proof  for  the  LTAL  clauses 
to  the  actual  SICStus  execution,  and  the  SICStus  Prolog  compiler  and  interpreter 
also  become  part  of  the  trusted  base. 

Table  3.2  in  Chapter  3  compares  the  proof-checking  time  of  the  life  benchmark 
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with  the  time  necessary  for  the  ML  compiler  to  generate  the  program.  For  ap¬ 
plications  where  the  output  of  a  compiler  is  to  be  checked  by  a  trusted  checker, 
it’s  desirable  that  checking  time  be  small  compared  to  compile  time.  SML/NJ  can 
compile  this  benchmark  in  0.49  seconds;  our  LTAL-generating  FPCC/ML  compiler 
takes  3.0  seconds  (the  slowdown  is  partly  because  it  takes  extra  time  to  preserve 
types,  and  mostly  because  we  have  not  engineered  the  back  end  for  speed).  LTAL 
type-checking  takes  0.43  seconds  in  SICStus  and  2.2  seconds  in  Flit. 

The  Flit  software  currently  comprises  about  1169  lines  of  C  code:  the  803  lines 
described  in  Section  5.1.2  for  parsing  axioms,  loading  proof  graphs,  and  LF  checking 
the  proofs  have  grown  to  852  lines;  our  new  logic-program  interpreter  is  about  282 
lines,  and  there  are  about  35  lines  to  manage  the  stages  described  in  Section  5.5.1. 

Necula’s  oracle-based  Prolog  interpreter  [Necula  and  Rahul,  2001]  is  about  800 
lines  of  C  code.  It  should  be  straightforward  to  use  our  style  of  LF  proof- checking 
of  Prolog  clauses,  but  use  oracle-based  execution  instead  of  our  interpreter.  Then, 
instead  of  an  1169-line  C  program,  we  would  have  a  1700- line  program.  In  such  a 
system,  the  proof  witnesses  would  be  just  as  tiny  as  Necula’s,  and  the  trusted  base 
would  be  somewhat  larger  than  that  of  the  system  we  have  described  in  this  paper. 

Our  initial  implementation  of  Flit  has  no  garbage  collector.  Checking  the  N  = 
1816  LTAL  example  consumes  approximately  4  million  heap  nodes  without  garbage 
collection.  To  scale  Flit  to  significantly  larger  inputs,  garbage  collection  would  be 
necessary.  Our  implementation  of  an  allocator  with  two-space  copying  collector  is 


70  lines  of  C  code. 
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5.9  Conclusion 

To  make  a  trustworthy  proof-checker  with  small  witnesses,  one  should  define  a 
language  for  proof-schemes,  with  a  way  to  represent  and  check  soundness  theorems 
for  the  proof  schemes;  then  one  should  implement  an  interpreter  to  execute  the 
proof  scheme  on  the  theorem  and  the  witness. 

Pollack  explained  much  of  this  in  “How  to  believe  a  machine-checked  proof” 
[Pollack,  1998]: 

...  I  suggest  that  the  “programming  language”  for  the  checking  pro¬ 
gram  be  a  logical  framework  [such  as]  the  Edinburgh  Logical  Frame¬ 
work  ....  we  [could]  program  a  checker  in  the  internal  language  of  the 
framework  ....  The  question  then  arises:  where  will  we  find  a  believable 
implementation  of  a  logical  framework? 

We  ask  you  to  believe  very  little.  Our  implementation  is  based  on  LF,  higher-order 
logic,  and  a  small  subset  of  pure  Prolog,  all  of  which  are  well  understood;  and  our 
implementation  is  about  as  small  as  possible — that  is,  to  trust  our  system  there  are 
less  than  1200  lines  of  code  that  you  have  to  understand. 


Chapter  6 


Conclusion  and  Future  Work 


In  summary,  the  adoption  of  a  low-level  typed  assembly  language,  construction  of 
its  semantic  model  and  machine-checkable  soundness  proof,  and  integration  of  a 
simple  logic  programming  engine  in  the  proof  checker  are  our  main  design  choices, 
and  they  serve  as  the  interfaces  between  the  compiler,  the  proof  checker  and  the 
proofs.  The  main  contribution  of  the  thesis  is  the  design  of  these  interfaces. 

We  have  designed  a  syntactic  low-level  typed  assembly  language,  called  LTAL, 
with  a  semantic  model  that  backs  up  its  soundness  with  a  machine-checkable  proof. 
The  semantic  modeling  technique  makes  LTAL  easily  and  safely  extensible.  It  has 
a  rich  set  of  expressive  constructors,  yet  its  type-checking  is  decidable  and  syntax- 
directed.  We  have  implemented  a  prototype  compiler  (by  Chen  and  Fang  [Chen 
et  ah,  2003]  based  on  SML/NJ)  that  transforms  core  ML  programs  to  SPARC  code 
annotated  with  LTAL  programs. 

In  a  Proof- Carrying  Code  (PCC)  system,  an  untrusted  prover  (code  producer) 
must  convince  a  trusted  checker  (code  consumer)  of  the  validity  of  a  theorem  by 
sending  a  proof.  The  proof  has  to  be  checked  by  a  trusted  checker.  The  proof 
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checking  in  our  system  is  mainly  done  in  two  steps  [Wu  et  al.,  2003].  First,  the 
LF  proofs  for  the  LTAL  type  checking  rules  are  checked;  this  is  standard  LF  proof 
checking.  Second,  the  LTAL  type  checker  (the  set  of  LTAL  type  checking  rules 
written  in  the  fashion  of  logic  clauses)  is  interpreted  by  a  logic  programming  engine. 

To  this  end,  we  have  built  a  tiny  and  trustworthy  proof-checker,  called  Flit 
[Appel  et  ah,  2002;  Wu  et  ah,  2003],  that  permits  small  proof  witnesses  and  machine- 
checkable  proofs  of  the  soundness  of  the  system.  Flit  includes  an  efficient  LF  proof 
checker  and  a  simple  yet  efficient  logic  programming  engine  that  implements  a 
subset  Prolog.  The  LF  checker  is  used  to  verify  the  soundness  proof  of  the  type 
system  chosen  by  the  compiler  or  user,  and  the  logic  programming  engine  is  used  to 
interpret  the  verified  type  checker  to  check  machine  code  together  with  some  proof 
hints  from  the  compiler.  The  LTAL  type  checker  is  written  in  such  a  way  that  it 
can  be  interpreted  by  Flit  logic  programming  engine  (without  backtracking). 

In  the  future,  there  are  several  directions  to  extend  our  Foundational  Proof- 
Carrying  Code  (FPCC)  system.  One  direction  is  to  build  FPCC  systems  for  object- 
oriented  languages  such  as  Java  and  C#  based  on  the  current  system.  The  core 
part  of  LTAL  and  the  soundness  proof  should  be  reusable. 

Another  direction  is  to  strengthen  the  current  safety  policy  and  to  build  FPCC 
systems  that  carry  proofs  for  stronger  properties.  To  specify  stronger  safety  proper¬ 
ties  at  machine  level,  sometimes  we  need  stronger  constructors,  such  as  types,  in  our 
Trusted  Computing  Base  (TCB).  We  are  currently  investigating  how  to  extend  our 
TCB  to  include  some  type  constructors  so  that  we  can  specify  interfaces  between 
two  low-level  code  modules. 


Appendix  A 

LTAL  Static  Semantics 


A.l  Coercion  Rules 

- 77 —  Coerceld 

p;  LRT  h  ct^t 


p;  LRT  h  ct^t'  p;  LRT  he  t'  r" 

T  TD'T1  1_  C1OC2  u 

p;  LitT  hc  r  ^  r 


CoerceComposition 


cfold  [/xckk.t] 

p;  LRT  hc  rf/ia  :  K.r/a]  <— >•  fia  :  k.t 


CoerceFold 


- — -  CoerceUnfold 

p;  LRT  bc  pa  :  k.t  cu^  r[p.a  :  K.r/a] 


T\  :  k 

r  ,  cpack[ri,3a:/^.T2l 

p;  LRT  bc  r2[ri/a]  3a  :  k.t2 


CoercePack 
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cinjection(sum  (rr  ,ru)) 

Tu  ^  sum (7V,  Tu) 


p;  LRT  h, 


CoerceSumlnjection 


/  N  csum2range 

p;  LixT  bc  sum(rr,  _L)  <— >■  rr 


CoerceSum2Range 


t  is  not  a  union  type 


/  i  \  csum2boxedone 

p;  LRT  bc  sum(_L,  r)  ^  r 


CoerceSum2Boxedone 


t  =  T\  U  r2  U  . . .  U  rn 

Ti  =  f ield (0,  int=(tj))  fl  t[  (for  all  1  <  %  <  n) 

(a  is  a  fresh  type  variable) 

-  CoerceSum2Hastaq 

csum2hastag  _ 

p]LRT\-cr  3a.hastag(a,  r) 

r  is  neither  a  union  type,  nor  a  bottom  type  „  TT  , 
- CoerceUnhastaq 

/  x  cunhastag 

p;  LRT  h c  nastag(rta3,  r )  ^  r 


- - - - - - -  CoerceSingleton2Range 

_  crange[ni,ri2] 

p]LRT\~cn  r—>  range(ni,n2) 


- c2int  -  CoerceSingleton2Int32 

p;LRT\rcn  int32 


-  CoerceRanqe2Int32 

t  r-i m  i  /  \  c2int32  . 

p;  LRT  bc  range(ni,n2)  ^  int32 


_ i  ^  0 _ 

p;  LRT  bc  int=(i)  ^  int^(0) 


CoerceSingleton2Nonzero 
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p;  LRT  b c  Ti 


cinjl  [nUT2] 


t\  U  r2 


CoercelnjectionLeft 


p;  LRT  bc  r2 


cinj2  [tiUt2] 


T\  U  T-2 


CoercelnjectionRight 


p;  LRT  bc  T\  fl  r2 


cpro j 1 


P 


CoerceProjectionLeft 


p;  LRT  bc  Ti  fl  r2 


cproj2 


T2 


CoerceProjectionRight 


p-LRT  bcdef(^)  T(0) 


CoerceName 


cdef(^) 

p;LRT\-cT(@)  -4  def(0) 


CoerceDef 


p;  LRT  bc  n  ^  r[  p;  LRT  bc  r2  ^ 

cunion(ci,C2) 

p;  LRT  bc  Tx  U  t2  U 


CoerceUnion 


p;  LRT  bc  r  ri  p;  LRT  bc  r  ^  r2 

cinters(ci,C2) 

p;  LRT  b c  r  ri  fl  r2 


Coercelnter section 


p;  LRT  bc  r  A  r' 


.  .  cfield(c)  . 

p;  Li?T  bc  field(rj,  r)  fielder) 


CoerceField 
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LRT  b  l  :  codeptr[a  :  k] (m,  cc,  n  :  t) 

- , ,  , -  CoerceAddr2Code 

p;  LRT  bc  addr(Z)  ca  ^c°  c  codeptr[a  :  k]  (rn,  cc,n  :  r) 


p;  LRT  bc  offset(0,  r) 


coffsetO 


T 


CoerceOffsetO 


p;  LRT  bc  r 


c2offset0 


offset(0,  r) 


Coerce20ffsetO 


t'  =  T^n\  where  n  =  \a\  :  Ki,a  :  k\ 
m\r']  =  m'  cc[t']  =  cc'  0[r']  —  4>' 
where  [■]  denotes  type  application1 

-  CoercePtapp 

cptapp(r) 

p;  LRT  bc  codeptr[a;i  :  up,  a  :  K\(m,  cc,  0)  ^  codeptr[a  :  /c](m  ,  cc  ,  0' ) 


A. 2  Instruction  Typing  Rules 

LRT ;  p;  0  b  v  :  3a  :  /c.r  I  t  0 

LRT  b  (p;  0;  cc)  {(a,  no)  =  open(n)}  (p,  a  :  k;  0,  no  :  r;  cc)  ? 


LRT ;  p;  0  b  n  :  r 


0'  =  0,  n  :  r,  if  rmap(n)  =  rmap(n') 


0'  =  (0\n),n  :  r,  if  rmap(n)  7^  rmap(n') 
b  (p;  0;  cc)  {n  =  n'}  (p;  0';  cc) 


InstrMove 


:We  use  de  Bruijn  index  representation  and  explicit  substitution  calculus  [Abadi  et  al.,  1990] 
notation  in  this  rule. 
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LRT ;  p;  0  b  v\  :  int  LRT ;  p;  0  b  v2  :  int 
LRT  b  (p;  0;  cc)  {c  =  Ci  op  v2}  (p;  (<j)\v),v  :  int;  cc) 


InstrAL  U 


LRT ;  p;  0  b  i>i  :  int=(ni) 
Li?T  b  (p;  0 ;  cc)  {/c 


LRT ;  p;  0  b  v2  :  int=(n2)  n  =  n  \  +  n2 
fi  +i  ^2}  (p;  ^ ;  (<t\v),v  :  int=(n);  cc) 


InstrAL  Ui 


LRT  b  (p;  0;  cc)  {v 


sethi(n)}  (p;  (<f)\v),v  :  int=(n  *  4096);  cc) 


Instr Sethi 


LRT ;  p;  0  b  i/  :  r 

Li?T  b  (p;  0;  cc)  {c  =  load(v')}  (p;  (<f)\v),v  :  r;  cc) 


InstrLoad 


LRT ;  p;  0  b  v'  :  t 

LRT  b  (p;  0;  cc)  {v  =  store(v')}  (p;  J^7;  (cj)\v),v  :  r;  cc) 


InstrStore 


LRT ;  p;  0  b  :  addr(/)  LRT ;  p;  0  b  t>2  :  diff (p,  /) 

LRT  b  (p;  J^7;  0;  cc)  {v  =  addradd(vi,  v2)}  (p;  J^7;  (<j>\v),v  :  addr(p);  cc) 


InstrAddrAdd 


LRT ;  p;  0  b  c’j  :  field (i,  r)  LRT]  p;  0  b  c2  :  int=(i) 

Li?T  b  (p;  J^7;  0;  cc)  {c  =  selector  1,  c2)}  (p;  J^7;  ((f)\v),v  :  r;  cc) 


Instr  Select 
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LRT]p]<f>  h  v'  :  hastag (Ttag,Tu)  0'  =  (</>\v),v  :  int=(rtofl) 
LRT  h  (p;  Jf’;  0;  cc)  {v  =  gettag(v')}  (p;  Jf";  0';  cc ) 


InstrGettag 


LRT]  p;  0  h  v  :  Ti  LRT]  p;  0  b  :  int=(i) 


0  <  i  <  n  m!  —  max{m,  i )  r'  =  r  n  (field (4i,  r*)) 
LRT  h  (p;  (n,  m,  r);  0;  cc)  {init(c;,  v)}  (p;  (n,  m' ,  t')]  0;  cc) 


Instrlnit 


LRT  h  (p;  (n,  m,  t)]  0;  cc)  {v 


record}  (p;  M*]  (<f)\v),v  :  t]  cc) 


InstrRecord 


LRT ]  p;  0  h  v  :  int=(n/)  m  <  n'  <  n 
- lnstrlncAllocptrl 

LRT  h  (p;  (n,  m,  £);  0;  cc_testmem(/c)) 

{ inc_allocptr(v) } 

(p;  (n  —  n' ,  —1,  T);  0;  cc_none) 


Li?T;  p;  0  h  v  :  int-(n')  m  <  n'  <  n  cc  A  cc_testmem(A;) 

- - - - — - —  InstrlncA  llocp  tr2 

LRT  h  (p;  (n,  m,  t)]  0;  cc) 

{inc_cillocptr(v)} 

(p;  (n-n',-l,T);0;  cc) 


Li?T ;  p;  0  b  v  :  codeptr([o;i  :  Hi, a  j  :  Kj](m,  cc',  V\  :  r[, ...  ,vn  :  r/J) 
LRT]  p;  0  h  V*  :  r/[cr]  (for  all  1  <  i  <  n) 
o  —  T\  ■  t2  •  ...  Tj  ■  id  cc  =  cc'[cr] 

=  (n i,  n2,  r)  7ii  >  m[cr] 


LRT  b  (p;  0;  cc)  {caZI(u,  [r1; . . . ,  Tj])}  (_;  _) 


InstrCall 
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LRT  b  l  :  codeptr([o;i  :  Ki, . . . ,  aj  :  Kj](m,  cc',  V\  :  r[, . . . ,  vn  :  t'J) 
LRT ]  p]  (f>  \~  i>i  :  t'[ct]  (for  all  1  <  i  <  n) 
a  —  Ti  ■  r2  •  ...  Tj  ■  id  cc  =  cc'[er] 

=  (n i,  n2,  r)  n  i  >  m[cr] 

LRT  b  (p;  0;  cc)  {calin(/,  [r1} . . . ,  Tj])}  (_;  _) 


InstrCalln 


LRT  b  (p;  0;  cc)  {cmp(ci,  c2)}  (p;  0;  cc_none) 


InstrCmp 


LRT ;  p;  0  b  ci  :  int=(ri)  LRT ;  p;  0  b  v2  :  int=(r2) 

LRT  b  (p;  0;  cc)  {cmpcc(c’i ,  c2)}  (p;  0;  cc_cmp(ri,  r2)) 


InstrCmpcc 


LRT;p]<f>  \~  v  :  t  0'  =  (( i>\v),v '  :  int=(a)  fir  cc'  =  cc_testbox(o;) 
LRT  b  (p;  0 ;  cc)  {(a,  v')  =  testbox(v)}  (p;  0';  cc') 


InstrTestbox 


0  <  n  <  1024  cc'  =  cc_testmem(n) 

LRT  b  (p;  cc)  (testmem(n)}  (p;  cc') 


InstrTestmem 


LRT ;  p;  0;  cc  b^  0  LRT ;  p;  0;  cc  b^  /2 

LRT  b  (p;  0;  cc)  {if  (7r)  then  /i  else  /2}  (_;  _) 


Instrlf 


cc  =  cc_testmem(n)  LRT ;  p;  0;  cc  b^ 

LRT-p ;  (n,  -1,  T);  0;  cc  b£  /2 

LRT  b  (p;  0;  cc)  {iffull  then  l\  else  /2}  (_;  _;_;_)  ^ 
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LRT ;  p;  0  b  v  :  int=(rQ)  fl  (int=(0)  U  . . .  U  int=(n  —  1)  U  r') 
r'  —  Ti  U  t2  U  . . .  U  Tm  cc  =  cc_testbox(Ta)  n  <  256 
Tj  =  (field (0,  int=(tapj)))  fl  t[  (for  all  1  <  i  <  m) 

LRT ;  p;  Jtf;  0,  V\  :  r';  cc  b^  l\ 

LRT ;  p;  q b,  v2  :  range(0,  n);  cc  b^  l2 

LRT  b  (p;  0;  cc)  {ifboxed  (v)  then  (v\,  h)  else  (v2,  l2)}  (_; 

LRT;  p;  (f>  \~  v  :  hastag(rQ,  ru) 
cc  =  cc_cmp(ra,  i) 

T  —  Ti  U  T2  U  .  .  .  U  Tn 

Ti  =  field (0,  int  =(tagi))  fl  r/  (for  all  1  <  i  <  n) 

Tt  =  Ul<,<n  Tj  where  i  n  tagj  holds 
Tf  =  IJi<fc<n  R  where  i  i r  tagk  does  not  hold 
LRT ;  p;  Jtf;  0,  V\  :  (field (0,  int=(ra)))  fl  rt;  cc  b^  l\ 

LRT ;  p;  0,  c2  :  (field (0.  int=(ra)))  fl  Tj ;  cc  b 1 12 
LRT  b  (p;  0;  cc)  {iftag  (7r)  {;c}  then  (v\,  h)  else  (v2,  l2 )}  (_; 


Instrlfboxed 


Instrlftag 
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